Tuesday, 24 April 2012

Reverse binary & Assembly

Caveats: This tutorial runs for intel's x86 architecture. Also, windows.

Introduction
For those who are new on the subject, let me shed some light for you. Reverse engineering is the art of inspecting, intercepting, interrupting, or manipulating application's inner workings. Even with obfuscations and protections, nothing can really be hidden from the prying eyes of determined cracker whose intention is nothing but to tear application's innards apart. This is simply because of one rule. If the application needs to be working, it has to be truthful to the machine. It just could not achieve its task without revealing itself to the machine beforehand.  And then,  in between the application and hardware, there is the Operating System, drivers, and software. This is where crackers come in. Through applications such as Ollydbg and Immunity Debugger, you not only can gain control to program's flow and logic, you can also produce a patched software with your own programming included.

Assembly language
Assembly is the closest language to machine language (If you could crack a complicated binary with mere machine language, you sir, has win). High level languages are always translated into assembly language through an assembler, before the machine could comprehend the programming. Assembly language has human readable operation codes (opcodes) such as MOV, ADD, and JMP. Basic operations available are divided into 3 groups, eg: Logic, Arithmetic, and Jumps.

Assembly can be quite simple if you can understand its logic. In fact, it can be the most efficient programming language you'll ever know. For these upcoming tutorials, understanding basic assembly language is a MUST. Otherwise, you could have hard times understanding the details in reversing binaries.

So, here goes the first lesson:

Memory and registers
Unlike high-level languages which uses variables, assembly has two locations to store its numbers which are: Memory and Registers. And then there are flags. They are boolean variables that holds either true or false. Some instructions sets flags, and some instruction uses flags in its operation. Memory are pretty much direct forward, I will explain them later. Registers are something that you should pay attention to. In intel x86, available registers are as follows:


Each register has its own, constant name, which is:
EAX, EBX, ECX, EDX, ESI, EDI, ESP, and EBP. They are all 32-bits. AX, BX, CX, and DX are 16 bits. AH and AL are the corresponding 8 bits sections of AX, for which AH is the higher 8 bits and AL is the lower 8 bits. This goes for all AX, BX, CX, and DX registers with their own High and Low representatives.

Size is IMPORTANT when it comes to assembly. Assembly language does not have data types, it treats every data as raw bits and bytes, thus overflowing must be handled manually for most times. There are specific instructions to handle data between different sizes, which, I leave to yourself to explore.

Assembly indirectly interact with your RAM through Virtual Memory. If you're not familiar with the concept of memory addressing, do read this.

Instructions
All assembly instructions have syntax just like other languages. The syntax in assembly operations are either 0, 1 or 2. This syntax could be destination, source, or quantity (count).

A language that is very close to assembly is C. C is almost the little brother of assembly. Understanding C would be very helpful as C is a considerably low-level language which programmers could explicitly use pointers, just like assembly.

Lets start with the first operation in assembly. MOV.

MOV EAX, 1000

The MOV instruction has a Destination and Source parameter. This MOV instruction has EAX as its destination and 1000 as its source. In this example, 1000 is a static number. NOTE: Most applications show assembly numbers in Hexadecimal, instead of decimal. Thus, 1000 in decimal is 4096. In C, the code would look like this:

int main()
{
     int a = 0x1000; // 0x1000 is hexadecimal
}


With MOV instruction, you could move data between registers to registers, registers to memory, and vice versa. Example for memory instruction:

;Assume that 0x00400000 is already allocated with R/W page access
MOV EAX, 00400000
MOV [EAX], 500

Note the brackets. Those are the indicator to write to memory at given address. They will tell the machine to treat EAX as a pointer to memory address. In this example, this code will input the address 0x00400000 into eax, and then modify 32-bit of data in 0x00400000 into 0x500.

In C:

int main()
{
        int *eax = (int*)0x00400000;
        *eax = 0x500;
}

Running this code will most probably crash your program. Why? 0x00400000 could be unallocated memory address in your program. Running it will simply produce memory access exception.

To read data from memory addresses, you could simply do this:

MOV EAX, [00400000]
This would read a 32-bit content at 0x00400000 and then transfer it to EAX.

In C:

int main()
{
       int *eax = (int*)0x00400000;
       printf("%d", eax);
}

Now for control flows in assembly.

CMP operand sets flags that are then used to make conditional jump instructions' decisions. This is equivalent to "if (...), then" in other languages. There is unconditional jump operand, which is the JMP. Conditional jumps are the ones that are important to decide a program's flow. Such jumps are like:

JE (Jump if equal)
JNE (Jump if not equal)
JL (Jump if lower)
JA (Jump if higher)
And so on.. this is a simple wiki of conditional jumps

Example:
...
...
MOV EAX, 1000
CMP EAX, 1000
JE 00800000
MOV EAX, 0 ; This code will never be executed
...
...
...
00800000:
;Some codes here
This example will set and then compare EAX to 0x1000. EAX IS equal to 0x1000, so JE will accept the condition as TRUE (ZF = true) and execute the jump. Hence, EAX will never be set to 0, as the code will never be executed.

C equivalent:

int main()
{
      int eax = 0x1000;
      if (eax == 0x1000) 
      {
            //
      }
      else
      {
            eax = 0; // This will never be executed
      }
}

That is all that I have to teach you for now, things are better for you yourself to explore. Do read up more on assembly.

Links:
http://ece425web.groups.et.byu.net/stable/labs/8086InstructionSet.html
http://www.jegerlehner.ch/intel/IntelCodeTable.pdf

No comments:

Post a Comment