C# code is an intellectual property, no company wants their code to be stolen. Unfortunately, compiled code is opened to everyone: it is easy to extract, modify and rebuild it. Another problem is that embedded resources are widely visible too.
.Net obfuscators to the rescue!
Why do you need an obfuscator?
First of all, let’s consider what’s happening with C# code from the moment when you click to Build in your favorite IDE till the code is executed on a client machine.
C# compiler generates machine-independent code, so-called IL (“intermediate language”) byte code. Then this machine-independent code is converted to machine code. It is done on-demand: when the execution reaches a method, a JIT (“just in time”) compiler creates a machine code that is executed by CPU. Java works the same way: yes, you have two compilers in the managed world.
This is how a .Net platform is implemented. Unlike C++ and other languages from the unmanaged world, machine-independent code (IL byte code) is deployed to clients.
But what does such a byte code look like? Is it easy to decode by humans? Is it possible to modify it?
Let’s look at a typical IL code:
.class private auto ansi beforefieldinit armdot_sample_embed_file.Program extends [mscorlib]System.Object { .method private hidebysig static void Main(string[] args) cil managed { .entrypoint // Code size 38 (0x26) .maxstack 8 IL_0000: nop IL_0001: call class [mscorlib]System.Reflection.Assembly [mscorlib]System.Reflection.Assembly::GetExecutingAssembly() IL_0006: callvirt instance string [mscorlib]System.Reflection.Assembly::get_Location() IL_000b: call string [mscorlib]System.IO.Path::GetDirectoryName(string) IL_0010: ldstr "somefile.txt" IL_0015: call string [mscorlib]System.IO.Path::Combine(string, string) IL_001a: call string [mscorlib]System.IO.File::ReadAllText(string) IL_001f: call void [mscorlib]System.Console::WriteLine(string) IL_0024: nop IL_0025: ret } // end of method Program::Main
Indeed, anyone can view string literals, names of used methods, types, fields, properties, and events. The execution flow is absolutely obvious.
There are a lot of tools that can convert such a code to C# code, or another .Net language. Then someone can easily modify and rebuild the sources.
That is why developers often ask how to obfuscate C# code. Let’s look at what obfuscation ways exist in order to do that.
Names Obfuscation
Names of all types, methods, fields, events, and properties are stored in an assembly and opened for anyone. As a developer gives human-readable names that have sense, this information provides a lot about how a particular application works.
Obfuscators rename these items with some nonsense words, or even with not readable characters, to hide this information. This is the very first step for those who want to obfuscate their code.
Strings Obfuscation
If you took a .Net code you would see that string literals are widely used: database connection strings, controls captions, methods and types names, etc. They are everywhere!
But string literals help to understand better how an application works. Of course, it is a good idea to hide strings used in a code to confuse a hacker.
.Net obfuscators replace instructions that use strings with a code that dynamically calculates such a string at runtime.
Control Flow Obfuscation
The IL has several opcodes to control execution flow. Some of them pop a value from an execution stack and transfer the execution to one or another instruction depending on the value. Others compare two values taken from the stack to decide where to transfer the execution.
It is very easy to realize how such a code works. Control flow obfuscation encrypts the logic of an application.
Obfuscators extract branches, place them one by one. Having such an array of branches, the obfuscator creates a large loop that changes the branch index at the end of each iteration. Original control flow logic is hidden because it has been changed to another logic that looks unified. A hacker has to decode the logic of calculating the next branch index and restore a set of original branches. If a count of branches reach hundreds and hundreds, it is really a very hard task!
But there is an even better approach to hide not only control flow execution but also all changes of the execution stack, parameters, and local variables.
Code Virtualization
The most modern obfuscation technology available nowadays is code virtualization. What ideas lay behind the virtualization?
As you already know, IL code is an intermediate, machine-independent code. We can imagine that IL code is executed on some virtual machine that maintains the execution stack, method arguments, and local variables. Of course, there is no such a virtual machine in reality. But, wait, what if we created such a machine. And, moreover, the machine could have another opcode set. This set is unknown for everyone, unlike the original IL opcode set.
The idea is simple: original instructions of a method are converted to a form that a virtual machine is able to decode. The virtual machine code reads the data and interprets it.
Now not only the execution flow is hidden, but also all manipulations with local variables and parameters, methods calls, loading, and storing fields aren’t readable.
It seems to be an almost unbreakable approach because each virtual machine and its opcode set are unique for each protected method.
Embedded Resources Encryption
Almost any .Net application has embedded resources such as string tables, images, and media. Often such assets are intellectual property and must be encrypted.
.Net obfuscators remove embedded resources from an application and add a code that provides an embedded resource on demand. They are located in process memory only and not saved to a disk.
Conclusion
As any .Net language, C# creates an intermediate code that is easy to understand. There are many ready to use tools that restore the original source code, so anyone can review the code, modify and rebuild it.
.Net obfuscators work with IL code, modifying it to make the task of understanding the code harder. There are several levels: string encryption, control flow obfuscation, and the most advanced, code virtualization.