This goal of this article is to collect important info about Assembly and basic concepts of reversing. The points are scattered but I’ll try to keep a cohesive structure.
An important note here is that I’ll be using Intel syntax, so in
MOV RBX, RAX, the first operand is RAX.
x86 has 8 registers that one can use
The old 32-bit registers have been extended to 64 bits, the r registers (RAX, RBX, RSP and so on).
In addition, there’s some extra general purpose registers R8 through R15 which can also be accessed as (for example) R8D, R8W and R8B (the lower 32-bit double-word, 16-bit word and 8-bit byte respectively).
The high byte of the old 16-bit registers is still accessible, under many circumstances, as AH, BH, and so on, but this appears to not be the case for the R8 through R15 registers.
Most applications on most modern operating systems (like FreeBSD, Linux or Microsoft Windows) use a memory model that points nearly all segment registers to the same place, effectively disabling their use. Typically the use of FS or GS is an exception to this rule, instead being used to point at thread-specific data.
FS: is a pointer to the root node in a structure exception handler linked list.
Also, the PEB (Process Environment Block) is at
FS:. If its dereferenced, it could be an anti-debugging measurement since it does contain info about the current process. Here’s a good reference to what the PEB structure contains
Context is very important when looking at assembly code. Take a lookie here:
PUSH 0x656c6c6f "hello" 448378203247
All the above lines have the same data associated with them. The determination of whether information is code or data depends on the context.
Compilers would usually use EBP/RBP to indirectly reference memory blocks. Here’s an example:
[EAX]would dereference EAX. That’s a direct dereference
[EBP + 0x10]dereferences data on the stack that is most probably a function argument.
[EBP - 0x10]dereferences data on the stack that is most probably a local variable.
[EAX + EBX * 0x8]dereferences an array with an 8-byte structure. The format is
[Base + counter * size]. The EBX in here is used as a counter to go through the elements of the array. So if one wants to access the 3rd element on this array, EBX would be 0x2.
[EAX + EBX + 0x10]dereferences a 2D array. The format is
[Base + index + displacement]
CMP instruction is an implied
SUB. A side effect of
SUB EAX, 8 is that it will test if EAX is equal to 8 and the result will be saved in a flag. Unfortunately, it will also modify EAX which we don’t want
TEST is an implied
AND. Its used a lot to test if an operation returned zero. A common pattern to observe would be as such.
call ds:HTTPSendRequestA ;Return 0 if failed test eax, eax jz short loc_xxxx ; ---------------------- ; Success path ; ---------------------- ; ... ; ... ; ... ; loc_xxxx: ; ---------------------- ; fail path ; ----------------------
Jump Above and below are what’s called ‘Unsigned Checks’. Below is an example
mov rax, -1 // rax == -1 cmp rbx, rax ja loc
The above code will always jump (except if rbx was -1 too) since
ja performs an unsigned operation, so -1 is interpreted as the highest possible unsigned number.
ja checks the first operand of
cmp (rax) and if it is above the second operand (rbx), it jumps.
; Stack grows downwards |OOOO| 0x00000000 LOW MEMORY |OOOO| |OOOO| |OOOO| |----| <- ESP |loc3| <- EBP - 0x10 |----| |loc2| <- EBP - 0xC |----| |loc1| <- EBP - 0x8 |----| |loc0| <- EBP - 0x4 |----| <- EBP |SFP | |----| |RET | |----| |arg0| <- EBP + 0x8 |----| |arg1| <- EBP + 0xC |----| |arg2| <- EBP + 0x10 |----| |OOOO| |OOOO| |OOOO| |OOOO| |OOOO| |OOOO| 0xFFFFFFFF HIGH MEMORY evil_func: PUSH EBP MOV EBP, ESP SUB ESP, 0x10 PUSH ESI ; Push to make space for a local variable PUSH EDI ; Push to make space for a local variable PUSH EBX ; Push to make space for a local variable PUSH ECX ; Push to make space for a local variable ... ; < The diagram above stops here > ... ... MOV ESP, EBP POP EBP ret main_func: ... ... PUSH arg2 PUSH arg1 PUSH arg0 CALL evil_func ADD ESP, 0x10 ; cdecl cleanup
Apparently there’s a lot of function call conventions, each one has a different way of handling things like cleaning up the stack and how to pass arguments to them. Below is a very non-extensive list:
An example of this is here
evil_func: PUSH EBP MOV EBP, ESP SUB ESP, 0x10 PUSH ESI ; Push to make space for a local variable PUSH EDI ; Push to make space for a local variable PUSH EBX ; Push to make space for a local variable PUSH ECX ; Push to make space for a local variable ... ... ... MOV ESP, EBP POP EBP ; Basically popping the local variables from the stack ret main_func: ... ... PUSH arg2 PUSH arg1 PUSH arg0 CALL evil_func ADD ESP, 0x10 ; cdecl cleanup
There’s a concept in reversing called Stack Neutralization. It refers to the fact that the stack has to become ‘neutral’ before transfering control to another location (e.g. perform a JMP). An example to this can be understood when dealing with packer malware. Before the wrapper malware can transfer control to the unpacked binary, the wrapper has to ‘neutralize the stack’ as in remove any allocated local variables and/or arguments from the stack and return the location of the base pointer to where it should be. Knowledge of calling conventions is necessary to search and understand what’s happening with the stack so that one can trace the state of the stack in an effort to find the unpacked malware.
There’s an important note to make here. Many times the compiler will shortcut the local variable cleanup into one LEAVE instruction. So the instructions
mov ESP, EBP and then
pop EBP that comprise the function epilogue replaced with a simple
Take a look at the function below
.text:0040611C ; int __cdecl sub_40611C(LPWSTR lpCommandLine, int, int, int, int, int) .text:0040611C sub_40611C proc near .text:0040611C .text:0040611C .text:0040611C StartupInfo = _STARTUPINFOW ptr -54h .text:0040611C ProcessInformation= _PROCESS_INFORMATION ptr -10h .text:0040611C lpCommandLine = dword ptr 8 .text:0040611C arg_14 = dword ptr 1Ch .text:0040611C .text:0040611C push ebp .text:0040611D mov ebp, esp .text:0040611F sub esp, 54h .text:00406122 push ebx .text:00406123 push esi .text:00406124 push edi .text:00406125 push 40h .text:00406127 xor ebx, ebx .text:00406129 lea eax, [ebp+StartupInfo.lpReserved] .text:0040612C push ebx .text:0040612D push eax .text:0040612E mov [ebp+StartupInfo.cb], 44h .text:00406135 call sub_40B160 .text:0040613A xor eax, eax .text:0040613C xor edi, edi .text:0040613E inc edi .text:0040613F add esp, 0Ch .text:00406142 cmp [ebp+arg_14], 8 .text:00406146 mov [ebp+StartupInfo.wShowWindow], ax .text:0040614A mov eax, [ebp+lpCommandLine] .text:0040614D mov [ebp+StartupInfo.dwFlags], edi .text:00406150 jnb short loc_406155 .text:00406152 lea eax, [ebp+lpCommandLine] .text:00406155 .text:00406155 loc_406155: ; CODE XREF: sub_40611C+34↑j .text:00406155 lea ecx, [ebp+ProcessInformation] .text:00406158 push ecx ; lpProcessInformation .text:00406159 lea ecx, [ebp+StartupInfo] .text:0040615C push ecx ; lpStartupInfo .text:0040615D push ebx ; lpCurrentDirectory .text:0040615E push ebx ; lpEnvironment .text:0040615F push 50h ; dwCreationFlags .text:00406161 push ebx ; bInheritHandles .text:00406162 push ebx ; lpThreadAttributes .text:00406163 push ebx ; lpProcessAttributes .text:00406164 push eax ; lpCommandLine .text:00406165 push ebx ; lpApplicationName .text:00406166 call ds:CreateProcessW .text:0040616C test eax, eax .text:0040616E jnz short loc_406182 .text:00406170 .text:00406170 loc_406170: ; CODE XREF: sub_40611C+78↓j .text:00406170 push edi .text:00406171 xor edi, edi .text:00406173 lea esi, [ebp+lpCommandLine] .text:00406176 call sub_402D33 .text:0040617B pop edi .text:0040617C pop esi .text:0040617D mov al, bl .text:0040617F pop ebx .text:00406180 leave .text:00406181 retn
How many arguments does it have? IDA got confused and mentioned two in the metadata and six in the function declaration. We’ll have to look at the calling sites and stack cleanup to know how this really is. One call site looks like this:
push instruction and a
mov to a location on the stack. Also, the cleanup is adding
1Ch to the stack pointer.
1Ch happens also to be the size of the 2nd argument reported by IDA in line
So what’s the function prologue? Well, it should start from
.text:0040611C and ends in
.text:00406124. A keen reader would notice we included a bunch of operations to save register values. Technically, these are NOT local variables; the compiler is simply saving registers that will be used during this function call.
Same goes for the prologue, which starts from
.text:0040611C and ends with
.text:00406181. The POP instructions at the end simply pop the used up registers. The LEAVE instruction is the one reponsible for popping out the two local variables we have.
evil_func: PUSH EBP MOV EBP, ESP SUB ESP, 0x10 PUSH ESI ; Push to make space for a local variable PUSH EDI ; Push to make space for a local variable PUSH EBX ; Push to make space for a local variable PUSH ECX ; Push to make space for a local variable ... ... ... MOV ESP, EBP POP EBP ADD ESP, 0x10 ; stdcall cleanup ret main_func: ... ... PUSH arg2 PUSH arg1 PUSH arg0 CALL evil_func MOV dword_ptr [c], EAX ; Move the return value of evil_func to local variable [c]
evil_func: ; function prolog push ebp mov ebp,esp sub esp,0D8h push ebx push esi push edi push ecx lea edi,[ebp-0D8h] mov ecx,36h mov eax,0CCCCCCCCh rep stos dword ptr [edi] pop ecx mov dword ptr [ebp-14h],edx mov dword ptr [ebp-8],ecx ; return a + b; mov eax,dword ptr [a] add eax,dword ptr [b] ; function epilog pop edi pop esi pop ebx mov esp,ebp pop ebp ret 0x8 main_func: ; put the arguments in the registers EDX and ECX mov EDX,3 mov ECX,2 ; call the function call evil_func ; copy the return value from EAX to a local variable (int c) mov dword ptr [c],EAX
You might notice here that there isn’t a call to
ADD ESP, 0x8 or something like that in
evil_func that would handle cleaning up the arguments. This is taken care of by
ret 8 which effictively returns the function and cleans up 0x8 from the stack. Same case occurs in thiscall convention and stdcall conventions.
thisand the callee cleans up the stack (just like stdcall)
thisis pushed onto the stack last and the caller cleans up the stack (just like cdecl)
; Compiled with a Microsoft compiler main_func: push 3 push 2 lea ecx,[sumObj] call evil_func@bunnyFooFoo ; bunnyFooFoo::sum mov dword ptr [s4],eax evil_func: ; function prologue push ebp mov ebp,esp sub esp,0CCh push ebx push esi push edi push ecx lea edi,[ebp-0CCh] mov ecx,33h mov eax,0CCCCCCCCh rep stos dword ptr [edi] pop ecx mov dword ptr [ebp-8],ecx ; return a + b mov eax,dword ptr [a] add eax,dword ptr [b] ; function epilogue pop edi pop esi pop ebx mov esp,ebp pop ebp ret 0x8
Next article will cover control flow statements