17 January 2017
This article is a look at how we were able to take good C crypto, and call it from our Clojure backend and our Clojurescript frontend without having to change a single line of a trusted base.
Before we get started, we should mention that browser crypto is a bad idea if your goal is to fight the Man. It is, on the other hand, a good idea if the goal is to prevent spreading sensitive data across caches and microservices. For example, if we delete a key we could never read (it's encrypted client-side), it is now rendered inert everywhere in our backend.
In the case of Balboa, we never want to see our users' data and we love the literature about crypto providing elegant solutions to access control. Also, since Asm.js keeps getting faster and more available, this may pay greater dividends in future browser releases.
Our big question was the following, “Is it possible to take something developers already trust and call that as faithfully as possible from Clojure and Clojurescript?”
After some research, we had a gut feeling it would look something like the above. It seemed sensible enough, so we set to work finding suitable algorithms for the experiment.
After exploring some potential issues with Emscripten compilation (timing attacks, memset's being removed silently, etc.), we went looking for a PRF algorithm that would be simple for emcc to compile faithfully. We decided on Skein/Threefish.
Skein's NIST x86 implementation had no strange assembly instructions to worry about and had already gone through the review process of the SHA-3 competition. In a later post, we will talk about some of the tricks we used for NaCl and scrypt, which will cover tradeoffs around when to write Clojurescript or C.
The first hurdle we encountered with JNI and Emscripten is what to do with structs. In the case of something like Skein, there is a struct, Skein1024_Ctxt_t, that contains all of the state for the pseudorandom function. This struct must be initialized and passed around for any incremental hashing operations.
In Emscripten, you can only pass number (int, float, void*) or string. Also, in Java, the passing of objects through the JNI boundary is taxing.
One way to portably share code across the platforms was memcpy'ing, back and forth, structs into uint8_t*. Once a struct gets represented as a uint8_t* and passed up to Clojure and Clojurescript as either a Uint8Array or a byte respectively, it can no longer be sensibly mutated. Fortunately, all modifications happen in C, where the uint8_t* can be memcpy'd back into a struct before operations are done on it.
skein_shim.c wraps the initialization and teardown of these uint8_t* into structs so that the API exposed to both Emscripten and JNI is always uint8_t*. For Emscripten, HEAPU8, the heap in the Asm.js virtual machine, is of type Uint8Array. No translation of the sort done in JNI for byte to uint8_t* is required. Instead, a set call is required to bring the buffer that exists outside of Emscripten's heap, into HEAPU8: a far simpler task.
If you want Emscripten code to run quickly, you generally have to set ALLOW_MEMORY_GROWTH=0 at Emscripten-compile-time, forcing you to work with a finite amount of heap. Calling malloc means calling free. Clojurescript offers some really elegant means for controlling your memory usage with Emscripten.
Like most cool things with Clojure(script), it involves a macro:
Manipulating data inside Emscripten requires copying data in and out of its own heap. We used the above methods for bringing Uint8Array back and forth.
For data being passed into the heap, the method is as follows: After malloc'ing space, a slice of the whole HEAPU8 is set.
In order to take data out of Emscripten, data is copied out of the heap, back into a regular Uint8Array, by representing the “memory address” and range as a slice of HEAPU8, and copying it into a new buffer. The cloned array is returned, and the allocation of Emscripten heap is free'd upon returning.
Our goal was to make sure this copying back and forth was negligible with respect to the amount of work being done in Emscripten-space.
And, now, the payoff: We can generate the HMAC for a given message! About time, if you ask me.