Coding Range

Emit all the things!

September 8th, 2013

IL generation is something I thought was kinda cool, but like assembly language was too low-level to be able to read, never mind write. IL in particular seemed beyond the realms of readable, but after having a poke around a JIT branch of Steam4NET, it seemed like there wasn’t really much to it.

The best way to learn is by doing, in my experience, so today I took a few hours to try build a Steam Web API client with a hardcoded interface and a (mostly) dynamically generated implementation. The results of that can be found here.

If you have a basic grasp of reflection in .NET, the rest is actually quite simple. Reflection, or introspection, is the ability to ask the runtime information about what’s going on, and sometimes to change it. Things like:

  • What’s the name of the class I’m in?
  • Is this class a subclass of that class?
  • Is this unknown object a string?
  • Does MyFooClass have a method named “BarMethod”?
  • Set the instance variable (field in .NET) named “myvar” on this object to the value “avalue”.

And plenty more. Fortunately, creating the structure of methods in fairly easy as they use the same objects that .NET’s reflection APIs use. Calling methods from methods is also fairly easy, as calling a method requires a MethodInfo - another object used by ‘normal’ reflection. Below is halfway between a blog post and journal notes.

That said, it’s still pretty low-level. For example, in order to call a method one has to set up all the method arguments in the correct order, and then call the method. For example, the following constructor

public MyClass(string thing)
    : base(thing)

is about as simple as a constructor of a subclass can get. When dynamically generating it, that becomes:

TypeBuilder typeBuilder; // get this from somewhere
var constructorBuilder = typeBuilder.DefineConstructor(MethodAttributes.Public | MethodAttributes.HideBySig, CallingConventions.HasThis, new[] { typeof(string) });
var baseConstructor = typeof(BaseClass).GetConstructor(BindingFlags.Public | BindingFlags.Instance, null, new Type[] { typeof(string) }, new ParameterModifier[] { });
var il = constructorBuilder.GetILGenerator();

il.Emit(OpCodes.Ldarg_0); // this
il.Emit(OpCodes.Ldarg_1); // thing
il.Emit(OpCodes.Call, baseConstructor);

So here the ‘OpCode’ ldarg loads an argument onto the evaluation stack. The first argument (zero-indexed) for an instance method has to be this based on the calling convention. Then the first argument is ldarg 1, the second is ldarg 2 and so on.

Once the evaluation stack contains all the arguments we need, we can call a method using the call OpCode. After the method is called, the return value of that method will be on the top of the evaluation stack to pass on to another method, or in this case, to return it and end the function.

Regarding local variables, there’s a list that has to be maintained by the person writing the opcodes. For the dictionary of API parameters, for example, there’s one local variable (the dictionary) so we store it in position 0 using the stloc opcode. Later, when passing the variable to another method, we can retrieve it using the ldloc opcode.

Some opcodes have seemingly shorthand codes. For example, il.Emit(OpCodes.Ldarg, 0) and il.Emit(OpCodes.Ldarg_0) seem to be identical.

When calling a method, scalar types such as integers have to be boxed. This can be done through il.Emit(OpCodes.Box, typeOfVariableHere) immediately after loading the variable onto the stack.

Constants can be loaded onto the stack using other ld* opcodes. For example, ldstr will load a string (e.g. il.Emit(OpCodes.Ldstr, "String Theory")).

The newobj opcode is the equivalent of the new keyword. Give it a constructor and it will create an object for you.

I’ve probably only just scratched the surface, but it’s good to be able to finally have some understanding of what IL opcodes mean - and it’s always cool to learn what’s going on at a much lower level than the happy world of abstraction.