In this article:
ILmerge is a famous tool for merging several pure .net managed DLLs into a single file.
You just specify the main assembly (it's also called primary assembly) and a set of additional assemblies. Then ILMerge takes the primary assembly, adds all the code, metadata, etc. there, and stores the result to the target assembly
So far so good, but many wonder if it's possible to embed other kind of files, especially unmanaged DLLs. No, obviously it's impossible due to the nature of ILMerge: it works at the .net IL level.
This article shows how virtualization can get over the restrictions. We will make a tool called BxILMerge that does the job like ILMerge but supports files of any type.
You can download the code from GitHub.
Idea: Virtualization + Mono.Cecil
Using BoxedApp SDK the developer creates a virtual file for each embedded file. It's a good idea to store the content of the file in embedded resources.
To modify the primary assembly, let's use Cecil, a library that reads and writes .net assemlies.
The code that virtualizes the files is placed in a separate assembly, BxIlMerge.Api.dll.
The module initializer first sets up an assembly resolver, then calls the method that virtualizes the files.
The assembly resolver is required to provide the content of the two assemblies that are used in the solution: BxIlMerge.Api.dll and BoxedAppSDK.Managed.dll.
Each file is stored in embedded resources of a special format: bx\<guid>\<file name>.
Cecil provides an easy interface to do it:
using (AssemblyDefinition inputAssembly = AssemblyDefinition.ReadAssembly( inputAssemblyPath, new ReaderParameters() { SymbolReaderProvider = new PdbReaderProvider(), ReadSymbols = true })) { // Add each file as embedded resource var resources = inputAssembly.MainModule.Resources; foreach (string fileToEmbedPath in filesToEmbedPaths) { // Generate unique name for embedded resource string embeddedResourceName; do { embeddedResourceName = string.Format("bx\\{0}\\{1}", Guid.NewGuid(), Path.GetFileName(fileToEmbedPath)); } while (null != resources.FirstOrDefault(x => x.Name == embeddedResourceName)); resources.Add(new EmbeddedResource(embeddedResourceName, ManifestResourceAttributes.Public, File.OpenRead(fileToEmbedPath))); }
To virtualize the files, just enumerate all the embedded resources, skipping those that have incorrect format.
We place this code in a separate assembly, BxIlMerge.Api.
public static void CreateVirtualFiles(Assembly assembly) { BoxedAppSDK.NativeMethods.BoxedAppSDK_Init(); foreach (string embeddedResourceName in assembly.GetManifestResourceNames()) { if (embeddedResourceName.StartsWith(@"bx\") && embeddedResourceName.Length > @"bx\".Length + Guid.NewGuid().ToString().Length && '\\' == embeddedResourceName[@"bx\".Length + Guid.NewGuid().ToString().Length]) { string virtualFileName = embeddedResourceName.Substring(@"bx\".Length + Guid.NewGuid().ToString().Length + 1); using (SafeFileHandle virtualFileHandle = new SafeFileHandle(BoxedAppSDK.NativeMethods.BoxedAppSDK_CreateVirtualFile( Path.Combine(Path.GetDirectoryName(assembly.Location), virtualFileName), NativeMethods.EFileAccess.GenericWrite, NativeMethods.EFileShare.Read, IntPtr.Zero, NativeMethods.ECreationDisposition.CreateAlways, 0, IntPtr.Zero), true)) { using (Stream virtualFileStream = new FileStream(virtualFileHandle, FileAccess.Write)) { using (Stream embeddedResourceStream = assembly.GetManifestResourceStream(embeddedResourceName)) { byte[] data = new byte[embeddedResourceStream.Length]; embeddedResourceStream.Read(data, 0, data.Length); virtualFileStream.Write(data, 0, data.Length); } } } } } }
We are going to virtualize the files using the method that resides in the assembly BxIlMerge.Api.dll, that uses the BoxedApp SDK assembly, BoxedAppSDK.Managed.dll
As we need to produce a single file, we have to include these two assemblies in the target assembly and provide them when .net runtime requests them.
AppDomain.AssemblyResolve helps to resolve assemblies. So, let's embed the assemblies and generate the code of the resolver:
static MethodDefinition CreateAssemblyResolveMethod(AssemblyDefinition assembly, MethodDefinition loadAssemblyFromEmbeddedResourceMethod) { MethodDefinition method = new MethodDefinition( "currentDomain_AssemblyResolve_" + Guid.NewGuid().ToString(), MethodAttributes.Static, assembly.MainModule.ImportReference(typeof(System.Reflection.Assembly)));
The event handler takes two arguments: the sender (an object) and System.ResolveEventArgs with information about the assembly being loaded:
// Parameters of the event handler method.Parameters.Add(new ParameterDefinition(assembly.MainModule.TypeSystem.Object)); method.Parameters.Add(new ParameterDefinition(assembly.MainModule.ImportReference(typeof(System.ResolveEventArgs))));
We need a variable to store the name of the requested assembly:
// Variable #0 - to store assembly name method.Body.Variables.Add(new VariableDefinition(assembly.MainModule.TypeSystem.String)); ILProcessor il = method.Body.GetILProcessor();
Save the name of the requested assembly:
il.Append(il.Create(OpCodes.Ldarg_1)); il.Append(il.Create(OpCodes.Callvirt, assembly.MainModule.ImportReference(typeof(System.ResolveEventArgs).GetProperty("Name").GetGetMethod()))); il.Append(il.Create(OpCodes.Stloc_0)); Instruction nextCheckInstruction = null; foreach (System.Reflection.Assembly bxAssembly in new System.Reflection.Assembly[] { typeof(BxIlMerge.Api.Bx).Assembly, typeof(BoxedAppSDK.NativeMethods).Assembly }) { var resources = assembly.MainModule.Resources;
For each assembly, add its content to the embedded resources:
// Add this bx related assembly to embedded resources string embeddedResourceName = Guid.NewGuid().ToString(); resources.Add(new EmbeddedResource(embeddedResourceName, ManifestResourceAttributes.Public, File.OpenRead(bxAssembly.Location))); if (null != nextCheckInstruction) il.Append(nextCheckInstruction); nextCheckInstruction = il.Create(OpCodes.Nop); Instruction foundAssemblyBranchStartInstruction = il.Create(OpCodes.Nop);
Compare the requested assembly name with the currently embedded assembly:
il.Append(il.Create(OpCodes.Ldstr, bxAssembly.FullName)); il.Append(il.Create(OpCodes.Ldloc_0)); il.Append(il.Create(OpCodes.Call, assembly.MainModule.ImportReference(typeof(System.String).GetMethod("CompareTo", new Type[] { typeof(string) })))); il.Append(il.Create(OpCodes.Brfalse, foundAssemblyBranchStartInstruction)); il.Append(il.Create(OpCodes.Br, nextCheckInstruction)); il.Append(foundAssemblyBranchStartInstruction);
If they match, call a special method that loads data from the embedded resource, then returns:
il.Append(il.Create(OpCodes.Ldstr, embeddedResourceName)); il.Append(il.Create(OpCodes.Call, loadAssemblyFromEmbeddedResourceMethod)); il.Append(il.Create(OpCodes.Ret)); } if (null != nextCheckInstruction) il.Append(nextCheckInstruction);
Finally, if the requested assembly is not found among the embedded ones, just return null:
il.Append(il.Create(OpCodes.Ldnull)); il.Append(il.Create(OpCodes.Ret));
Loading Assembly From Embedded Resource
This helper is used in the assembly resolver.
static MethodDefinition CreateLoadAssemblyFromEmbeddedResourceMethod(AssemblyDefinition assembly) { MethodDefinition method = new MethodDefinition( "currentDomain_AssemblyResolve_" + Guid.NewGuid().ToString(), MethodAttributes.Static, assembly.MainModule.ImportReference(typeof(System.Reflection.Assembly)));
The method takes a single argument - the embedded resource name.
// Parameter #0 - name of embedded resource method.Parameters.Add(new ParameterDefinition(assembly.MainModule.TypeSystem.String));
We need two variables: to store the embedded resource stream, and a buffer to store the content of the stream:
// Variable #0 - embedded resource stream method.Body.Variables.Add(new VariableDefinition(assembly.MainModule.ImportReference(typeof(Stream)))); // Variable #1 - byte array to store embedded resource stream content method.Body.Variables.Add(new VariableDefinition(new ArrayType(assembly.MainModule.TypeSystem.Byte))); ILProcessor il = method.Body.GetILProcessor();
Get the embedded resource stream:
il.Append(il.Create(OpCodes.Call, assembly.MainModule.ImportReference(typeof(System.Reflection.Assembly).GetMethod("GetExecutingAssembly")))); il.Append(il.Create(OpCodes.Ldarg_0)); il.Append(il.Create(OpCodes.Callvirt, assembly.MainModule.ImportReference(typeof(System.Reflection.Assembly).GetMethod("GetManifestResourceStream", new Type[] { typeof(string) })))); il.Append(il.Create(OpCodes.Stloc_0));
Get the length of the embedded resource stream and create a byte array of this length:
il.Append(il.Create(OpCodes.Ldloc_0)); il.Append(il.Create(OpCodes.Callvirt, assembly.MainModule.ImportReference(typeof(Stream).GetProperty("Length").GetGetMethod()))); il.Append(il.Create(OpCodes.Newarr, assembly.MainModule.TypeSystem.Byte)); il.Append(il.Create(OpCodes.Stloc_1));
Read the entire content from the stream to the array:
il.Append(il.Create(OpCodes.Ldloc_0)); il.Append(il.Create(OpCodes.Ldloc_1)); il.Append(il.Create(OpCodes.Ldc_I4_0)); il.Append(il.Create(OpCodes.Ldloc_1)); il.Append(il.Create(OpCodes.Ldlen)); il.Append(il.Create(OpCodes.Callvirt, assembly.MainModule.ImportReference(typeof(Stream).GetMethod("Read", new Type[] { typeof(byte[]), typeof(int), typeof(int) })))); il.Append(il.Create(OpCodes.Pop)); // we don't need the result
Load the assembly from the array:
il.Append(il.Create(OpCodes.Ldloc_1)); il.Append(il.Create(OpCodes.Call, assembly.MainModule.ImportReference(typeof(System.Reflection.Assembly).GetMethod("Load", new Type[] { typeof(byte[]) })))); il.Append(il.Create(OpCodes.Ret));
Module Initializer to Virtualize Files
We have the code that virtualizes the files in the assembly BxIlMerge.Api, but where is the code called? Who is the caller?
Here we use a CLR feature called "module initializer" - a method that is called on module loading. So, we add this method to the primary assembly.
This method sets up the assembly resolver to load the working assemblies (BxIlMerge.Api.dll and BoxedAppSDK.Managed.dll) and then call the method that virtualizes the files using BoxedApp SDK.
Why not virtualize the files directly, in the module initializer? CLR loads the assemblies used by the method before the execution. So, if the module initializer called the methods of BoxedApp SDK, CLR would attempt to load BoxedAppSDK.Managed.dll before the assembly resolver was set up! That's the reason for splitting the code.
The module initializer has a special name, .cctor:
MethodDefinition cctor = new MethodDefinition( ".cctor", MethodAttributes.Static | MethodAttributes.SpecialName | MethodAttributes.RTSpecialName, assembly.MainModule.TypeSystem.Void); ILProcessor il = cctor.Body.GetILProcessor();
Add assembly resolver:
// AppDomain.CurrentDomain.AssemblyResolve += currentDomain_AssemblyResolve; il.Append(il.Create(OpCodes.Call, assembly.MainModule.ImportReference(typeof(System.AppDomain).GetProperty("CurrentDomain").GetGetMethod()))); il.Append(il.Create(OpCodes.Ldnull)); il.Append(il.Create(OpCodes.Ldftn, assemblyResolveMethod)); il.Append(il.Create(OpCodes.Newobj, assembly.MainModule.ImportReference(typeof(System.ResolveEventHandler).GetConstructor(new Type[] { typeof(object), typeof(IntPtr) })))); il.Append(il.Create(OpCodes.Callvirt, assembly.MainModule.ImportReference(typeof(System.AppDomain).GetEvent("AssemblyResolve").GetAddMethod())));
Then call the method that virtualizes the embedded files:
il.Append(il.Create(OpCodes.Call, callCreateVirtualFilesMethod)); il.Append(il.Create(OpCodes.Ret));
You can download the code from GitHub.