Wednesday, January 27, 2010

Reverse Engineering - 2

The last post , we just felt around a little bit . The main things we understood were:

--Dynamic Runtime Program Analysis
--What Rev Engg was
--Compiling a program and the steps involved

Effectively, when you compile a program , you convert the code you wrote into a form which you can use to do something you couldn't do manually or which would have taken far too much time. When reversing you only have the final form ; the final binary/executable and need to find out exactly what it did. Assuming that you already did the dynamic analysis that Lenny Zeltser discussed ; the next step is to find out as much as you can about the program , the environment it is running in and the other components that make it run. During the first few blog posts I will be referring only to Linux as I'm far more comfortable with it than Windows.

When a Linux binary is run , it becomes a process which consumes resources on the host. While doing so it receives something called a PID(Process ID). The details about the various resources that the binary consumes are stored in the /proc folder on Linux. Lets look at one process entry for a running process ; say sshd (The SSH Daemon). Here is what a ps aux listing for ssh gives:
root 1501 0.0 0.2 6064 1080 ? Ss Jan25 0:06 /usr/sbin/sshd

The number 1501 will be a directory in /proc . Inside /proc/1501 will be all the resources that sshd consumes.
cmdline: Contains the command that started the process, with all its parameters. If its malware that's running this is a good place where you can get all the options the malware was started with.
[root@dilby 1501]# more cmdline
/usr/sbin/sshd

environ: Shows all environment variables for the process and all its child processes.
[root@dilby 1501]# more environ
SELINUX_INIT=YESCONSOLE=/dev/console
The environment variables aren't really separated clearly; here the environment variables are:
SELINUX_INIT and CONSOLE . YES and /dev/console are its values. These can be clearly listed as follows:
[root@dilby 1501]# cat /proc/1501/environ | tr '\0' '\n'
SELINUX_INIT=YES
CONSOLE=/dev/console

fd:
File descriptors for input , output and error for each process. In case a process is redirecting output somewhere , you know where. Here's a sample listing for the 1501 process. 0(input) , 1(output) and 2(error) are all redirecting to /dev/null (black hole) means this is a daemon. Its also making some network call as can be seen by 3(socket:some number)
lrwx------. 1 root root 64 2010-01-29 15:12 0 -> /dev/null
lrwx------. 1 root root 64 2010-01-29 15:12 1 -> /dev/null
lrwx------. 1 root root 64 2010-01-29 15:12 2 -> /dev/null
lrwx------. 1 root root 64 2010-01-29 15:12 3 -> socket:[5715]


If you want to confirm that 5715 is something(socket) that actually does belong to SSH you can run netstat as follows.
[root@dilby ~]# netstat -ae | grep -v -i unix
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State User Inode
tcp 0 0 *:ssh *:* LISTEN root 5715

Ah its an inode. SSH is using an inode for its socket communication. Anytime hence you want to find out if a process is doing something over the network; look for socket fd's in here.

maps
: Deals with the memory in use by the process and addressable areas by the process and its dependencies. This will not make much sense just now, when we get to actually looking at ASM it'll help.

status
: Provide information about the status of the process. Here's a sample:
Name: sshd
State: S (sleeping)
Tgid: 1501
Pid: 1501
PPid: 1

Apart from this, there's plenty of other information that you can get in the /proc directory. Discussing it at this point though, won't be too beneficial so I'll skip it.

What type of file is it? Is it a known file format? Does it have any dependencies?
Use file or ldd to find out. Here's an example:
[root@dilby 1501]# file ~arvind/a.out
/home/arvind/a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9, not stripped

ldd , if it gives you a long list is saying - This is a dynamic file and needs these libraries on your system to function properly. Here's an example:
[root@dilby 1501]# ldd ~arvind/a.out
linux-gate.so.1 => (0x00110000)
libc.so.6 => /lib/libc.so.6 (0x004c0000)
/lib/ld-linux.so.2 (0x004a1000)

If it were statically compiled (all libraries prepackaged into the binary) then u'd get very different messages.
[root@dilby arvind]# gcc -static a.c
[root@dilby arvind]# file a.out
a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, for GNU/Linux 2.6.9, not stripped
[root@dilby arvind]# ldd a.out
not a dynamic executable

For an initial analysis of the file on disk, that'll do. You probably want to check if the file is communicating with the network around it. The socket inode which we discussed above is one way. Another way is to look at lsof and netstat to look at active connections. Key options for netstat are:
-a - All connections
-n - All entries in numbers
-p - Program using that connection
-e - Includes Inode number on file system used by that connection
-l - Only listening connections
-r - Current routing table
-c - Repeat netstat every . Useful when u want to check new connections.

Running tcpdump or wireshark at runtime also is helpful for viewing greater detail. Here is a great cheat sheet for the same. We'll start dipping our feet into Assembly language next time.

Tuesday, January 26, 2010

Reverse Engineering - 1

We started discussing Rev Engg here. What we will do in this first post is take a gentle look into a little terminology that we'll encounter down the road. I won't touch Windows right now - coz the basics are best learnt by using all of the open source tools that are available on Linux systems. The only requirement hence is a Linux system - Ubuntu works well although all the necessary tools can be found on a RedHat or probably any other Unix system as well.

Before doing that however what I'd like you guys to do is to think of how you could analyze a trojan. First thing that comes to mind is -- Run it and see what it does. After all nothing like seeing it in action..rt? There's a couple of problems with that even a beginner like me can think of:

a) Need to be very careful so it doesn't damage any other systems at all.
b) There's numerous hidden mechanisms that might not be activated by just running it.

Problem a) could possibly be solved by carefully creating an isolated environment and ensuring that system doesn't interact at all with the outside world. Problem b) is a toughie though - Unless you have the code of the malware in front of you; you can't be sure that you found everything.

The advantages though are that you get a birds eye view of a lot of the key features of a trojan - something that would have taken much longer had you sat down with a million lines of assembly code. This entire study of runtime trojan analysis is called Dynamic Code Review. While this series will primarily focus on understanding malware through assembly language - it is a great idea to run through Lenny Zeltser's - Introduction to Malware course first. Once you're done, continue reading the rest of this post.

Caught your eye ..didn't it? Not surprised at all ;). Great now that you have a fair idea of what to expect with malware lets get down to understanding actual reversing via assembly language. The only structured free work I could find online was over here. That guide while very cool is a little difficult to follow at times. So what I'm going to do is use that as a base - and try and elaborate wherever needed so we get the maximum possible benefit and learn as much as we can. I'm going to shamelessly link there(like I did above) wherever its needed and I feel that I cannot put things any better than they already have. Wherever needed I'll elaborate a little more - The whole idea really is to get the flow of learning this subject absolutely perfect. Well lets go now!

Chapters 1 and 2 are very well written, they are great introductions to the nuts and bolts of the subject itself. Nothing to add here , just go ahead and read the whole of those and drop back here.

Ok great - At this point I'm just going to go over what all we must be clear on before we move forward.
--- What is reverse engineering and what you are in for.

--- An understanding of the compilation process of a C program; including all the terminology used there. Since you don't want to keep referring back to all those basic definitions which are very important none the less, I made a glossary sheet which I will keep adding to as I learn more and more.

Chapter 3 talks about getting a lot of information about the processes that run on your system. I will discuss that in greater detail in the next part. I will be going into just a little bit more detail than Chapter 3 there. Stick around.

Reverse Engineering - Introduction

Reverse Engineering - Series

I've been trying to learn Reverse Engineering for quite a while now. Granted; its one of the tougher subjects to learn, but the amount of literature there is out there is not really very well organized. I have invariably found myself giving up on it somewhere down the line due to the lack of direction on how to proceed. What I am trying to do now is start right from the basics yet again - This time i plan to document the approach much better than I have done. So atleast the next time I have some kind of a reference point to start from. I am not sure how long this will take or how many parts this will contain. All I plan to do here is to put down my learnings in an organized fashion so people new to this field do not struggle as much as I have and do not go down all the wrong paths of learning.

There are a few things that I have always got out of all those Reverse Engineering Tutorials I have read. This is a list of the same.
a) RTFM - Politely tells you to read a lot
b) Learn how to debug - Here ppl will rave a lot about Softice and Olly and W32 dasm and give examples
c) Learn assembly programming using NASM or something else - Will point you to a book in Assembly programming
d) Understand all the Intel syntax for instructions - Will point you to an Intel site
e) Solve crackme's - Little executables put together with a little bit of protection which you have to break
f) Examples - Many people will show you how they cracked something

Well, all of this no doubt is correct. But for a person like me, its still all too directionless and there is no one best way to learn all this. What to take first? How to begin? I know I always had those questions in my mind and still do. However I have now started on a path that I hope is correct. Over the next few articles I hope to blog as I learn. I'm still a novice , so do point out the mistakes I make and I'll correct them as I go on.