Stripe CTF3 write-up

I’ve been kinda distracted throughout work for about a week now, because of the third Capture the Flag competition hosted by Stripe. The first CTF was on application security – you had a ‘locked down’ account on a computer, possibly with various programs available. And you had to break out of the account to read a file you weren’t meant to be able to access, containing the password for the next level. I wasn’t aware of that first competition.

The second one was on Web Application security – you were presented with web access to a web app, and the source code for the apps (each level in a different languages), and had to exploit various vulnerabilities in order to get the password for the next level. The web apps ranged from basic HTML form processing, to rudimentary AJAX twitter clones, to APIs for ordering pizza. The vulnerabilities ranged from basic  file upload validation to SHA-1 length extension attacks, Javascript injection all culminating in a level that involved using port numbers to dramatically reduce search space for a 12 character password. I completed that one and won a t-shirt.

The third one, the one that has just happened, was different. It was ‘themed’ around Distributed Systems, rather than security. You’d be given some sample code that you’d have to speed up, either by finding bottle necks in the code or by making the system distributed and fault tolerant. Spoilers will follow. Although the contest is now over, the level code (and associated test harness) is available here if you still want a go. I will note that it’s entirely possible to rewrite each level into a language you’re familiar with (I didn’t take that approach though, given that half the fun is not knowing the language).

So. To details.

I didn’t mange to finish it, although I made it to the last level, of which I sunk far more time into than was healthy – I’m fairly certain my tiredness at work for a couple of days was due to this level.

Level 0

Level 0. A basic Ruby program that reads in text, and if a word appears in a dictionary file it will enclose the word in angle brackets.

I didn’t know Ruby, but I had a good inkling of where to tackle this level, given how simple the program was. A quick google of Ruby Data Structures, and Ruby’s String split() method confirmed my idea. The original code did a string.split() on the dictionary and then repeatedly looked up each word against the Array that function returns. By transforming that array into Ruby’s notion of a Set, I could gain the speedboost from super-fast hash based checking.

I also modified the comparison to do an in place replacement as it saved the cost of duplicating the entire string. I’m unsure how much weight that had against the Array->Set change.

Level 1

A bash script that tries to mine the fictional currency of Gitcoin. Gitcoin is essentially like Bitcoin. You “mine” gitcoins by adding a valid commit to the repository. That commit must modify the ledger file to add one to your own total of gitcoins. A valid commit is one whose commit hash is lexicographically less than the value contained in difficulty – that is to say, if the difficulty contained 00001 your commit hash would have to start with 00000[0-F]. Because of how git works you have to find such a commit before anyone else mining against the same repository finds a valid commit.

There was one main thing in this level to fix. And that’s the call out to git that mock hashes the commit object to see if it’s valid. If it isn’t it alters the commit message text in some way, and then hashes again. This is slow. It’s slow because of a couple of reasons. Git likes to lock its repository files during operations, so you can’t do parallel searches for valid commits. But also because git objects have to have a very specific format, which git takes time to go and generate before returning the hash. The final thing is that each commit contains the hash of the parent commit as part of it, so naturally should another miner find a gitcoin before you, you have to start the search over again.

To achieve this, I moved the SHA1 testing over to Python. I formed the commit object that git creates manually – the header consisting of the term “commit ” and the length of the commit body, with a null byte. I left the body (which itself has to have a very specific format) as it was in the original script. I called pythons SHA1 library to do the work, which is a non-blocking operation, thus meaning I could set 8 separate processes going at once, each trying a unique set of commit messages. Upon success they then spat out a commit message into a file.

Annoyingly my solution then became quite clunky, with myself manually changing the filename to read in a copy of the original script that bypassed the searching. That pushed the correct commit. Ideally I’d have automated that into the first script, but it was enough to get me a valid commit pushed to Stripes git servers, thus meaning the next level was unlocked.

Incidentally this level had a bonus round, where instead of being against four stripe bots mining, you’d be competing against the other players who had completed the level. Needless to say, people very quickly started throwing GPU based SHA1 tools at it, and I was outclassed by a wide degree.

Level 2

Node.js, again I had no experience (although I do know javascript). You were given skeleton code that had to mitigate a DDoS attack. Your code would be placed in front of an under attack web service, and it had to ensure all legitimate requests got through, and strangely enough illegitimate requests to keep the servers busy, but not falling over. (You lost points in the scoring system for how long the target servers were idle.

In practise this was rather simple as the requests were easily differentiated – each legitimate IP would only make a few requests and relatively far apart. Simply keeping a log of the IPs seen and when they were last seen was enough to differentiate the legitimate mice from the illegitimate elephants. You also had to load balance between the two servers that were available – they would fall over if they had to process more than 4 requests at a time. You knew how long it each request would have before the backend servers timed the connection out, so by keeping a log of when each request was proxied, and to which server, you could check how many requests were likely to still be on the server.

Pretty simple.

Level 3

The penultimate level. Scala. I had great troubles with language on this one, I suspect partly because it’s close enough to Java that I get confused mentally when translating what I want to do into the scala syntax.

You were given four servers – a master one, and three slave servers that would never be contacted by the test harness. You were provided with a directory which you had to index all the files under. Then you had to respond to a barrage of search requests (for which you were also expected to return substring matches).

The default code was incredibly poor, so there were some immediate optimisations that were obvious. Firstly, the master server only ever sent search requests to the first of the slave nodes, which also had to index  and search the entire corpus. There’s two approaches now – split the corpus and send each search to all nodes, or split the searches but make each node index the entire corpus. I went with the former. I split the corpus based on root subdirectory number. So the slave0 would index when subDir%3 = 0. Any files directly under the root directory would have been indexed by all nodes.

The second obvious improvement was that the index was an object containing a list of files that the searcher needed to search. That object was serialised to disk, the searcher would read that in. Then for each query it would go off and load the file from disk before searching the file. My first change was to never serialise the object out, but keep it in memory. That didn’t make much of a difference. Then two options presented themselves. I could try constructing an Inverted Index – that would contain each trigram (as I had to handle substring searches)  and a list of the files and lines where that trigram was found. Or I could take the lazy option of reading all the files in at indexing time (you had 4 minutes until the search queries would start) and storing those directly in the in-memory index. I took that option. I transformed the index list into a HashMap of FilePath to Contents.  And that pretty much got me to pass. Somehow. I don’t feel like that was enough work myself, but that was more than made up for by the last level.

Level 4

I couldn’t crack this one. I tried for days. I think it was from Sunday through Wednesday, excepting some time out for the day job.

The language was Go. I know no Go. The challenge: A network of servers, each with a SQLite database. The network is unreliable with lag and jitter randomly added, and network links being broken for seconds at a time. Search queries will be directed to any of the nodes for 30 seconds. All answers they give as a network must be correct. You are disqualified instantly should you return an inconsistent answer. You gain points for each correct response. You lose points for every network byte of network traffic. Oh, and unlike in the other examples, the sample code they provided you with doesn’t pass the test harness – it gets disqualified for inconsistent output.

So. This level was about distributed consensus – how to get multiple nodes to agree on the order of operations given communication problems. I’m just thankful we didn’t also have to contend with malicious nodes trying to join or modify the traffic. If you could get traffic through it was unmodified.

The starter help text contained pointers to a Distributed Consensus Protocol called Raft. Vastly simplifying the intricacies: Nodes elect a leader. Only the leader can make writes to the log (in this case an SQLite Database). The leader will only commit a log once a majority of nodes have confirmed that they have written to the log themselves. If the leader goes missing, the remaining nodes will elect a new leader.

There’s a library already written for Go, Go-Raft. This seemed like a sure fire winner. Just drop in Raft right? Although dropping the library in was very easy it wasn’t that simple. Raft is a very chatty protocol requiring heartbeat signals, leader elections and in our case, request forwarding to the leader as followers do not have authority to commit to the log.

Beyond that though, the go-raft library had issues. It didn’t work with Unix Sockets (that the test harness required) out of the box (although Stripe had had a commit merged into Go-Rafts master branch that made fixing that extremely simple. It could fail to elect a leader. It also had a bug that seemed to bite a lot of people in IRC – I only saw it once, and I’m still not sure on what exactly the cause is – I suspect a missing/misplaced lock() that caused a situation with the log that is fatal for the raft consensus algorithm.

After battling with Unix sockets and getting an excellent passing score locally – at one point I got 150 points normalised, whilst you only needed 50 to pass, I pushed to remote. And it fell over horrendously. I ended up with a negative point score before normalisation. Needless to say that was demoralising. It turns out that reading the original Raft protocol paper, understanding it theoretically, and getting it to work with some easier test cases is very different to getting it to work in a much more hostile set of conditions.

My problems on this level were compounded by the infrastructure regularly falling over and needing the Stripe guys to give the servers a kick or 10.

But beyond that, I feel that there’s something I failed to grok. When my connections could get through it worked fine – SQL was always consistent, leaders were always elected, requests were forwarded properly (barring one case that I have since read about where the request is forwarded and executed successfully but the response is lost due to jitter).  And yet when running on remote I either suffered from End of File errors (i.e. socket closed), or requests timing out. Although I eventually managed to reproduce those issues locally by manually downloading the test case, it didn’t help me in diagnosing the problem – I regularly had a case where one node, in the entire 30 second test runs, never managed to join the consensus (which takes a grand total of one successful request to do). And I didn’t know what to do. I think that the most valuable thing this level taught me, beyond the theory of distributed systems, is how bad I am at fixing problems when there’s no errors that are directly caused by my code. As far as I could tell everything was fine – if I ran it manually without the test harness in the middle it all worked. But when I put the test harness in, it all fell over. My own logic tells me that therefore the problem must be with the test harness. Except I know people managed to pass the level with go-raft. I need to go and look at some solutions people have posted to see how they coped.

At the end of the day, however fun this was overall, the last level left a bad taste in my mouth – the infrastructure problems were pretty endemic especially nearer the end, and the difference between local and remote in the last level was absolutely disheartening. I can accept some difference, but a score that locally is three times higher than the threshold (after normalisation) shouldn’t get negative points on remote. I just wanted the T-Shirt!