Tuesday, November 10, 2009

Fooling loader!

Why would someone need to fool a loader to skip loading some parts of the binary? Well, I have my reasons and won't write them here now. In case you need to do the same, read on to learn how you can do it:

First let me explain you what we are trying to do here. We are trying to make the loader skip some parts of the .text section. On contemporary Linux systems the executable files are built into ELF format. The loader reads the ELF headers and loads the program or shared libraries into the memory as directed by these headers. There are many headers in ELF, but we are only interested in the program headers for our purposes.

We use readelf to query the ELF information about an executable file. -l option lists the program headers.
> readelf -l program

Elf file type is EXEC (Executable file)
Entry point 0x8048680
There are 8 program headers, starting at offset 52

Program Headers:










Type
Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x08048034 0x08048034 0x00100 0x00100 R E 0x4
INTERP 0x000134 0x08048134 0x08048134 0x00013 0x00013 R 0x1
[Requesting program interpreter: /lib/ld-linux.so.2]
LOAD 0x000000 0x08048000 0x08048000 0x10300 0x10300 R E 0x1000
LOAD 0x010300 0x08059300 0x08059300 0x00148 0x00150 RW 0x1000
DYNAMIC 0x01031c 0x0805931c 0x0805931c 0x000e0 0x000e0 RW 0x4
NOTE 0x000148 0x08048148 0x08048148 0x00020 0x00020 R 0x4
GNU_EH_FRAME 0x01016c 0x0805816c 0x0805816c 0x0005c 0x0005c R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4


Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame
03 .ctors .dtors .jcr .dynamic .got .got.plt .data .bss
04 .dynamic
05 .note.ABI-tag
06 .eh_frame_hdr
07


We are interested in the program headers that are marked as 'LOAD'.
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x08048000 0x08048000 0x10300 0x10300 R E 0x1000
This line tells the loader to map the region [0x0 - 0x10300) of the file to virtual memory at [0x8048000 - 0x8058300). This region occupies 0x10300 bytes on disk (file size); it will also occupy the same amount in memory. It is readable and executable, but not writable.

If you want to avoid loading a specific part into memory, you can modify these load tags accordingly.

"How am I gonna edit this?", you ask. You can read the ELF specification (this may not be the official specification, but it is accurate), and then open up the executable in binary format and count bytes and edit them. Or, you can refer to my previous blog entry on a binary editor called HT Editor. You should prefer the latter ;)

In order to fool loader not to load a specific part of the file to the memory, we need to open up a hole in the loadable region. That requires us to split one of the loadable regions into two. But we do not have enough space for another program header (not exactly true since we can move all program headers to the end of the file and use as much space as we need; but for now, let's stay away from that since it is too much of a hassle).

See the NOTE section? My understanding is that you can remove it if you want; at least that was what I did, and the world did not collapse. Using the HT Editor, copy DYNAMIC section to NOTE section, and second LOAD section to DYNAMIC section. Now we have an open slot. At this point, the program headers should look like this:


Program Headers:










Type
Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x08048034 0x08048034 0x00100 0x00100 R E 0x4
INTERP 0x000134 0x08048134 0x08048134 0x00013 0x00013 R 0x1
[Requesting program interpreter: /lib/ld-linux.so.2]
LOAD 0x000000 0x08048000 0x08048000 0x10300 0x10300 R E 0x1000
LOAD 0x010300 0x08059300 0x08059300 0x00148 0x00150 RW 0x1000
LOAD 0x010300 0x08059300 0x08059300 0x00148 0x00150 RW 0x1000
DYNAMIC 0x01031c 0x0805931c 0x0805931c 0x000e0 0x000e0 RW 0x4
GNU_EH_FRAME 0x01016c 0x0805816c 0x0805816c 0x0005c 0x0005c R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4

Note that the third LOAD entry is a duplicate of the second.

Now lets modify the first and second LOAD entries to make a hole. Say, I want loader to load 2 pages, then skip 5 pages, and continue with loading the rest of the program. We need to modify the first LOAD entry so that loader won't load all the program. Modifying the size fields would suffice. Assuming page size is 0x1000, file size and mem size fields of the first entry should be set to 0x2000 (2 pages). Then, we will skip the next 5 pages. The second load should start from the 8th page which starts at 0x7000. Therefore, the offset field of the second load should be set to 0x7000. It will be mapped to a memory location that is offset by 0x7000 from the start of the load section which is 0x8048000. Therefore, both virt. addr. and phys. addr. fields should be set to 0x804f000. Since the total size of the first load section was 0x10300, the size field of the second load should be 0x9300 (0x10300 - 0x7000). In the end, the load sections should look like this:






Type
Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x08048000 0x08048000 0x02000 0x02000 R E 0x1000
LOAD 0x007000 0x0804f000 0x0804f000 0x09300 0x09300 RW 0x1000
LOAD 0x010300 0x08059300 0x08059300 0x00148 0x00150 RW 0x1000

At this point let me tell you that if you program ever needs to execute an instruction that is not loaded, it will crash with a SIGSEGV. So, don't mess with program headers if you are not willing to handle the segmentation violations.

Wednesday, August 26, 2009

Executable file editor (ELF editor)

If you need to edit executable files, HT Editor is your friend!

Download it from their website, build it, and run. I use it specifically to edit ELF program and section headers. Run ht. To access the menu items at the top, I had to use Esc key (Esc + F to access File menu, for example). To access the menu items at the bottom, use function keys: You will see "6 mode" at the bottom, hit F6 to change the mode.

To edit ELF headers, first start the program, then load the executable (F3 - Open), then hit F6 (mode), select elf/header (or any other part that you are interested), scroll to where you want to edit, hit F4 to switch to edit mode, do your edits, and hit F2 to save. Note that you can't run the program from another shell while working on it in the edit mode; either switch to view mode again (F4), or close HT editor.

Friday, August 7, 2009

Profiling 101

Quick profiling tips. Nothing really new here. I am just putting down this info so that I can find it later on :D

In order to profile your executable, compile and link the code with -g and -pg options. Note that you have to use these flags for linking as well.

Once you create an executable file, just run it as you normally would. It will create a gmon.out file. Then run 'gprof gmon.out' and voila! Here is the profiling data. You might want to check flags for gprof.

If your executable file forks a new process that you want to profile as well; or your executable file is an MPI program, you soon will realize that you only get one gmon.out file when you run your program. gmon.out that is produced by the final process overwrites the old one. You must use an environment variable called GMON_OUT_PREFIX for a get around. Set it to 'gmon.out-'`/bin/uname -n`. For bash, add this line to your .bashrc file:

export GMON_OUT_PREFIX='gmon.out-'`/bin/uname -n`

This way, every process will create its own gmon.out version, i.e. a file that starts with gmon.out followed by the machine name and pid (process id).

What if you want to profile a shared library? Again, compile and link the shared library with -g and -pg options. Then, set LD_PROFILE environment variable to the name of the shared library you want to profile. Finally, run the executable which uses this shared library. Upon completion, a file named .profile will be created in
/var/tmp. Use the sprof program to get the profile information for the shared library.

I also read that you can use LD_PROFILE_OUTPUT to specify a directory other then /var/tmp, but I haven't used this myself.

You might ask how one can profile multiple shared libraries. I really don't know an easy answer.

P.S. The information provided above is valid for gnu tools. I haven't tried any of the information above on any other tool.