andyfriesen.github.io

Crux - Records

2016-06-08T00:00:00+00:00

This is my second post about Crux. You may want to read the first one first.

Records are a pretty fundamental idea in every programming language. It’s important to get them right.

We have some requirements:

Records have to be easy to understand
Crux must be a delightful programming environment on the web too, so records must be easy to use when working with foreign JavaScript APIs.
Mutability needs to be convenient and predictable

I think we’ve come up with something that’s both novel and hits all the sweet spots.

Row Polymorphism

First off, records in Crux are what we call row polymorphic. This means firstly that a record is no more or less than the set of fields it has. If two records have the same fields, and those fields have the same types, then they have the same record type. This is also called structural typing. OCaml and TypeScript also make use of this idea.

This is in stark contrast to languages like C# and Java where a type declaration adds a sort of identity to the data type. This is what we call nominal typing and Crux supports this as well. (I’ll get to this another day)

In a structural type system, we don’t care so much about exact matches. Instead, we just care that a value has the properties that a particular function needs. For instance, we might write a hypotenuse function for points

fun hypot(point) {
    sqrt(point.x * point.x + point.y * point.y)
}

The point parameter of this function clearly needs to have an x and a y, but we haven’t said anything about what other properties it might have. In Crux, like TypeScript, it doesn’t matter:

let named_point = {
    name: "My House",
    x: 122.4194,
    y: 37.7749
}
let h = hypot(named_point)

This is ok. As long as the argument satisfies the required properties, additional properties are allowed.

Mutability

Immutable values are fantastic things to have around. They’re so much easier to reason about. We’ve done a lot of work both in environments where things are mutable and immutable by default, and the latter is quite a lot better longterm.

We’ve also worked in environments where mutability is a fair bit less convenient to get at, and we’d really prefer to be on the other side of that fence.

To that end, we wanted Crux to afford easy access to immutable data, but with a convenient way to strip that off and start changing things.

We use type inference to sort all of this out.

You can mutate a record field just like you think you should:

let named_point = {
    name: "My House",
    x: 122.4194,
    y: 37.7749
}
named_point.name = "This name is much better"

One thing you can do in Crux is to explicitly declare record fields to be mutable or immutable. Presently, we do this with a type annotation. We might add syntax to make this easier.

// Define a little type alias, for brevity
type NamedPoint = {
    const name: String,
    mutable x: Number,
    mutable y: Number
}

let named_point : NamedPoint = {
    name: "The Greatest Point",
    x: 999,
    y: 999
}

These annotations are optional, and if you don’t specify one, the type inference engine will figure it out.

fun zero_out(point) {
    point.x = 0
    point.y = 0
}
let my_point = { x: 3, y: 2 } // x and y must be mutable
zero_out(my_point)

So far we’ve talked about mutable and immutable record fields, but there is actually a third state which we haven’t figured out a name for yet. It is a record field that isn’t mutated in the current scope, but may or may not be mutable in other scopes.

The reason for this is because we can easily prove that a function requires a mutable field, but we can never prove that a mutable field is forbidden. Consider our first example:

fun hypot(point) {
    sqrt(point.x * point.x + point.y * point.y)
}

Either a mutable or an immutable x and y will work just fine. Every record field is thus one of the following:

Mutable,
Immutable, or
An immutable view into a value which might be mutable. This is very similar to const in C++.

The really nice thing about this scheme is that the type inference engine will generally stay out of your way until you put a type annotation on a record field.

JavaScript

Lastly, Crux will not be delightful to use if it’s difficult to talk to JavaScript code. To make this easy, we promise that Crux will obey two rules:

A Crux record maps exactly to a JavaScript object, and
Calling a function on a Crux record always generates the code for a JS method call

Let’s look at a simple example. Say we want to run this function:

fun main() {
    document.body.insertBefore(
        document.createTextNode("Hello!"),
        document.body.firstChild
    )
}

First off, Crux doesn’t yet know anything about browser APIs. We’ll add this to the standard library someday, but for now, we need to build our standard library. :)

The document object is always in scope on a web page, so we’ll use the declare construct to tell the compiler that it exists.

declare document : Document

No code is generated from this declaration. It’s just a promise to the compiler.

Next, we need to define the Document type:

type Document = {
    createTextNode: (String) -> Node,
    body: {
        insertBefore: (Node, Node) -> Node,
        firstChild: Node
    }
}

data Node {}

Astute readers might ask about what this means in relation to JS prototypes and method dispatch, and the answer is quite simple: Crux has no awareness whatsoever of these things. We promise that document.createTextNode("Hello!") in Crux will generate the JS document.createTextNode("Hello!"), but how that JS statement will be executed is left up to the JS engine.

Note here that we also defined a Node type, but didn’t say anything at all about its composition. This is an easy way to make a data type that has no user-inspectable parts. You can think of it as an inscrutable baton that gets passed around.

You can try it yourself in our online playground.

Crux - A Programming Language for People

2016-05-19T00:00:00+00:00

Chad Austin and I have been working on a programming language for the past 6 months or so. It is still not sufficiently stable that I’d recommend it for actual production use, but we’ve done enough that we think it might be interesting to language nerds.

Crux arose from a lot of research and personal experience on both our parts.

JavaScript

To start, we both have a lot of experience dealing with large, old code bases written in dynamic languages. We had the privilege of working with some tremendously smart, motivated people who all wanted to do the right thing, but we were nevertheless left feeling unsatisfied with the amount of work it takes to get good reliability and agility out of dynamic languages.

JavaScript is absolutely not the language we’d like to build our web applications in.

Haskell

Secondly, we’ve also got a lot of experience on the opposite extreme: we have both written quite a lot of production Haskell. We love the fidelity of Haskell’s type system and how it helps real humans write good software that can still change even when it is large and old, but we found the human factors to leave something to be desired:

In order to do anything with any data type, you have to move your cursor to the top of the file and add an import statement. Larger modules require dozens of imports. We’ve seen over a hundred imports in a single source file.
Haskell is lazy. Because of this, a lot more is required of the compiler to get reasonable code, and even then, it’s easy for a well-meaning person to write code that allocates far more memory than expected. (a “space leak”)
There is a JS backend for Haskell, but the code it generates is very large (960kb for Hello World!) and is almost impossible for a human being to understand.

Haskell is great (we’re using it to author the compiler!), but it’s far from perfect, and we can’t use it on the web anyway.

OCaml

There exists a spectacular JS backend for OCaml called js_of_ocaml. It generates fast, somewhat readable JS, and the OCaml language itself is remarkably well thought out.

The problem is that (and I must stress that I think this regrettable) OCaml will never become a popular mainstream language, and it has nothing to do with OCaml’s theoretical soundness.

OCaml is culturally tonedeaf:

Arrays use the syntax [| 1 ; 2 ; 3 ; 4 |]. Linked lists use the syntax [1 ; 2 ; 3 ; 4].
Tuples use , and do not require parens. The expression [1 , 2] is actually a list of 1 tuple.
OCaml has no overloading. Adding integers is done with the + operator; to add floats, the +. operator must instead be used.
OCaml has objects, but the method access operator is #, not . or -> eg document#createElement "text"
Mutable data is created with the ref function. This looks great. Assignment, however, uses :=. Reading a ref cell requires using the ! operator, much like you use the * operator in C to dereference a pointer. eg x := !x + 1

Crucially, there are very good historical and technical reasons why all of these things are the way they are, but contemporary programmers don’t look at that. We see let a = [|1; 2; 3|];; and we’re done. No further justification is necessary.

OCaml is a surprisingly adept language for the web, but it can never be more than a tiny niche.

TypeScript

Lastly, we looked at TypeScript.

TypeScript looks as though it is purpose-made to be a success among JavaScript developers.

Almost all of its syntax is instantly recognizable to people coming from JS, C#, or Java, and it has a stellar story for working with untyped JS: the JS compiler is designed from the start to do very little more than perform extra type checking. If you strip the types from TypeScript, you get JavaScript.

Unfortunately, preexisting JS doesn’t necessarily map to any kind of sane static type system, so TypeScript is intentionally unsound. By this I mean that it is possible to write a valid TypeScript program that incorrectly uses a value of one type as though it has some other (unrelated) type.

TypeScript also wound up repeating the Billion Dollar Mistake.

Now, it’s certainly the case that TypeScript is a killer solution if you specifically have a preexisting JS application that you need to improve incrementally, but I think unsoundness and pervasive nullability fatally compromise a system’s resilience to change.

TypeScript is what I want to move my aging JS codebase to, but it’s not where I want to start, if I have any choice.

Crux

From these, we arrive at Crux’s key pillars:

Crux helps you write programs that are still easy to change when they are old and large
Compiled Crux is small, fast, and has predictable performance
Crux looks like contemporary programmers expect

I’ll go into more detail about what this means in upcoming posts.

Crux

Haskell Basics: How to Loop

2015-12-18T00:00:00+00:00

One of the things that really gets newcomers to Haskell is that it’s got a vision of flow control that’s completely foreign. OCaml is arguably Haskell’s nearest popular cousin, and even it has basic things like while and for loops.

Throw in all this business with endofunctors and burritos and it’s pretty clear that a lot of newcomers get frustrated because all this theoretical stuff gets in the way of writing algorithms that they already know how to write. In other languages, these newcomers are experts and they are not at all used to feeling lost.

As a preface, I’m not going to explain how monads work, and I’m not going to explain any of the historical anecdotes that explain why these things are the way they are. This territory is incredibly well-trod by others.

Additionally, many of the things that I’ll describe here are non-idiomatic Haskell, but none create design-wrecking maintenance or performance problems. I think it’s better that newcomers write “ugly” code that works than it is that they learn all of functional programming all at once. :)

Pure Loops

If your loop doesn’t require side effects, the thing you’re actually after is some kind of transform. You want to turn a sequence into something else by walking it.

Transforming Elements

If you just want to transform each element of a collection, but you don’t want to change the type (or length!) of the collection at all, you probably want a map. The map function is called map and has this signature:

map :: (a -> b) -> [a] -> [b]

If you don’t have a list, but instead have a Vector, Map, deque or whatever, you can use its more general cousin fmap:

fmap :: Functor f => (a -> b) -> f a -> f b

Accumulating (aka folding)

Consider this simple JS:

function count(anArray) {
    var result = 0;
    for (var i = 0; i < anArray.length; ++i) {
        result += anArray[i];
    }
    return result;
}

This clearly isn’t a map. The result isn’t an array at all. It’s something else.

When you want to walk an array and build up a value like this, use a fold. The Haskell function you should start with is called foldl', found in the Data.Foldable package. The above transliterates to this Haskell:

count l =
    let accumulate acc el = el + acc
    in foldl' accumulate 0 l

foldl' takes a function, an initial value and the collection to walk. This function takes the result that has been computed so far, and the next element to merge in.

Accumulations that exit early sometimes

Edited: Updated this section per feedback from lamefun. Thanks!.

Consider this:

function indexOf(list, element) {
    for (var i = 0; i < list.length; ++i) {
        if (list[i] == element) {
            return i;
        }
    }
}

This is superficially similar to what we were doing above, but we want to stop looping when we hit a certain point.

When the builtin traversals don’t obviously provide something you actually want, the end-all solution is the tail-recursive loop.

This is the most manual way to loop in Haskell, and as such it’s the most flexible.

indexOf' list element =
    let step l index = case l of
            [] -> Nothing
            (x:xs) ->
                if x == element
                    then Just index
                    else step xs (index + 1)
    in step list 0

The pattern you want to follow is to write a helper function that takes as arguments all the state that changes from iteration to iteration. When you want to update your state and jump to the start of the loop, do a recursive call with your new, updated arguments.

The only thing to worry about is to ensure that your recursive call is in tail position. The compiler will optimize tail calls into “goto” instructions rather than “calls.”

Impure Loops

Just Plain Doing Stuff

Data.Traversable exports a function called forM_ which takes a traversable data structure and a monadic function and it runs the action on each element, discarding the results.

This is as close to a C++-style for() loop as you’re going to get.

main = do
    forM_ [1..100] $ \number -> do
        putStr $ show number ++ " "
        when (0 == number `mod` 3) $
            putStr "Fizz"
        when (0 == number `mod` 5) $
            putStr "Buzz"
        putStrLn ""

Mapping

If you drop the underscore and use forM instead, you can capture the results.

main = do
    strings <- forM [1..5] $ \number -> do
        putStr $ "Enter string " ++ show number ++ ": "
        getLine

    print strings

Accumulating

Honestly, if it’s impure, you can just create an IORef. IORefs are mutable variables in Haskell.

main = do
    let increment n = n + 1

    count <- newIORef 0

    forM_ [0..50] $ \number -> do
        modifyIORef' count increment

    c <- readIORef count
    print c

Better Accumulating

foldM is exactly analogous to foldl', except it’s monadic. This means that you can use it to perform side effects in your loop body as you accumulate values.

main = do
    let l = [0..4]
    let iter acc element = do
            putStrLn $ "Executing side effect " ++ show element
            return (acc + element)
    total <- foldM iter 0 l
    putStrLn $ "Total is " ++ show total

Accumulation with early termination

Just like with pure code, when libraries don’t seem to offer what you want, just write out the tail-recursive loop. The only difference is that monadic functions generally have to return some value in non-recursive cases. If you just want to do stuff and don’t have a result you want to carry back, return (). Think of it as an empty tuple.

main = do
    let test a_list = case a_list of
            [] ->
                return ()
            (x:xs) -> do
                putStrLn $ "Testing element " ++ show x
                if 0 == x `mod` 3
                    then return ()
                    else test xs
    test [1..10]

Here, our test function splices apart the list it is given, and stops if it is empty or if it divides evenly into 3. If not, it tail recurses with the rest of the list.

Something useful to observe here is that we are, in a certain sense, effecting a “mutable variable” by way of the recursive call. The parameter “shrinks” with each successive recursive step.

This is also the most flexible way to write a loop. Anything you can do in C, you can do in Haskell by way of variations on this template.

Testable IO in Haskell

2015-06-17T00:00:00+00:00

At IMVU, we write a lot of tests. Ideally, we write tests for every feature and bugfix we write. The problem we run into is one of scale: if each of IMVU’s tests were 99.9% reliable, 1 out of every 5 runs would result in an intermittent failure.

Tests erroneously fail for lots of reasons: the test could be running in the midst of the “extra” daylight-savings hour or a leap day (or a leap second!). The database could have been left corrupted by another test. CPU scheduling could prioritize one process over another. Maybe the random number generator just so happened to produce two zeroes in a row.

All of these things boil down to the same root cause: nondeterminism within the test.

We’ve done a lot of work at IMVU to isolate and control nondeterminism in our test frameworks. One of my favourite techniques is the way we make our Haskell tests provably perfectly deterministic.

Here’s how it works.

This post is Literate Haskell, which basically means you can point GHC at it directly and run it. You can download it here.

We’ll start with some boilerplate.

{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE NamedFieldPuns #-}

module Main where

import Control.Monad.State.Lazy as S

What we’re looking to achieve here is a syntax-lightweight way of writing side effectful logic in a way that permits easy unit testing.

In particular, a property we’d very much like to have is the ability to deny our actions access to IO when they are running in a unit test.

For this example, we’ll posit that the very important business action we wish to test is to prompt the user for their name, then say hello:

importantBusinessAction = do
    writeLine "Please enter your name: "
    name <- readLine
    if "" == name
        then do
            writeLine "I really really need a name!"
            importantBusinessAction
        else
            writeLine $ "Hello, " ++ name ++ "!"

We’ll achieve this by defining a class of monad in which testable side effects can occur. We’ll name this class World.

class Monad m => World m where
    writeLine :: String -> m ()
    readLine :: m String

We can now write the type of our importantBusinessAction:

importantBusinessAction :: World m => m ()

The name of this type can be read as “an action producing unit for some monad m in World.”

When our application is running in production, we don’t require anything except IO to run, so it’s perfectly sensible for IO to be a context in which World actions can be run. The Haskell Prelude already offers the exact functions we need, so this instance is completely trivial:

instance World IO where
    writeLine = putStrLn
    readLine = getLine

In unit tests, we specifically want to deny access to any kind of nondeterminism, so we’ll use the State monad. State provides the illusion of a mutable piece of data through a pure computation. We’ll pack the state of our application up in a record.

type FakeIO = S.State FakeState

(I’ll get to FakeState in a second)

Aside from reliability, this design has another very useful property: It is impossible for tests to interfere with one another even if many tests share the same state. This means that “test fixtures” can trivially be effected by simply running an action and using the resulting state in as many tests as desired.

The state record FakeState itself essentially captures the full state of the fake application at any one moment.

The writeLine implementation is very easy: We just need to accumulate a list of lines that were printed. We can carry that directly in our state record.

The readLine action is a bit more complicated. We’re going to write all kinds of tests for our application, and we really don’t want to burn any one particular behaviour into the framework. We want to parameterize this on a per-test basis.

We’ll solve this by embedding an action directly into our state record.

data FakeState = FS
    { fsWrittenLines :: [String]
    , fsReadLine     :: FakeIO String
    }

def :: FakeState
def = FS
    { fsWrittenLines = []
    , fsReadLine = return ""
    }

Now, given this record, we can declare that FakeIO is also a valid World Monad, and provide implementations for our platform when run under unit test.

instance World (S.State FakeState) where
    writeLine s = do
        st <- S.get
        let oldLines = fsWrittenLines st
        S.put st { fsWrittenLines = s:oldLines }

    readLine = do
        st <- S.get
        let readLineAction = fsReadLine st
        readLineAction

We also write a small helper function to make unit tests read a bit more naturally:

runFakeWorld :: b -> State b a -> (a, b)
runFakeWorld = flip S.runState

Now, let’s write our first unit test.

We wish to test that our application rejects the empty string as a name. When the user does this, we wish to verify that the customer sees an error message and is asked again for their name.

First, we’ll craft a readLine implementation that produces the empty string once, then the string “Joe.”

Making this function more natural without compromising extensibility is left as an exercise to the reader. :)

Note that by providing the type FakeIO String, we have effectively authored an action that can only be used in a unit test. The build will fail if production code tries to use this action.

main :: IO ()
main = do
    let readLine_that_is_incorrect_once :: FakeIO String
        readLine_that_is_incorrect_once = do
            S.modify (\s -> s { fsReadLine = return "Joe" })
            return ""

Now that we have that, we can create a FakeState that represents the scenario we wish to test.

    let initState = def
            { fsReadLine = readLine_that_is_incorrect_once }

And go!

    let ((), endState) = runFakeWorld initState importantBusinessAction

Note that runFakeWorld produces a pair of the result of the action and the final state. We can inspect this record freely:

    forM_ (reverse $ fsWrittenLines endState) $ \line ->
        print line

That’s it!

In a real application, your FakeState analogue will be much more complex, potentially including things like a clock, a pseudo-random number generator, and potentially state for a pure database of some sort. Some of these things are themselves complex to build out, but, as long as those implementations are pure, everything snaps together neatly.

If complete isolation from IO is impractical, this technique could also be adjusted to run atop a StateT rather than pure State. This allows for imperfect side-effect isolation where necessary.

Happy testing!

Source Code.

What does unique_ptr<> cost?

2014-10-07T00:00:00+00:00

I just watched Jonathan Blow’s proposal for a new programming language, which got me thinking about the difficulties that motivated the talk.

In particular, I think unique_ptr<> is fantastic, but I’m curious about how it affects compile times and code size. Let’s find out.

First, some C++

#include <memory>

using std::unique_ptr;

struct S {
    unique_ptr<int[]> ints;
};

int main() {
    const int LEN = 50;
    auto s = S {
        unique_ptr<int[]> { new int[LEN] }
    };

    auto j = 0u;
    for (auto i = 0u; i < LEN; ++i) {
        s.ints[i] = ++j;
    }

    auto sum = 0;
    for (auto i = 0u; i < LEN; ++i) {
        sum += s.ints[i];
    }

    printf("Sum! %i\n", sum);
}

Next, the equivalent C:

#include <stdlib.h>
#include <stdio.h>

typedef struct S {
    int* ints;
} S;

#define LEN 50

int main() {
    S s;
    unsigned i, j;
    int sum;

    s.ints = (int*)malloc(LEN * sizeof(int));

    j = 0u;
    for (i = 0u; i < LEN; ++i) {
        s.ints[i] = ++j;
    }

    sum = 0;
    for (i = 0u; i < LEN; ++i) {
        sum += s.ints[i];
    }

    printf("Sum! %i\n", sum);

    free(s.ints);
}

On my machine (a mid-2012 Retina MBP), I get these figures: (averaged over 5 runs of clang and clang++ each)

We’ll look at code size too:

c1.c:      37ms / 8kb
cpp1.cpp: 123ms / 8kb

Wow! What’s going on?

First off, using clang++ to build the C version runs at the same speed. That should have been obvious, but I wanted to test it anyway.

Secondly, if I add #include <memory> to the C version and run it through clang++, I see the same 123ms build times.

So, that’s probably most of the picture, but my mental model of templates is that you pay for them in two ways:

When you #include the header, you pay the time it takes for the compiler to parse that header, and
you pay again to instantiate the template with a particular set of types.

So, what’s the instantiation cost?

Let’s try this:

struct S0 { int a0; S0(int i) : a0(i) {} }; auto u0 = unique_ptr<S0> { new S0(0) };
struct S1 { int a1; S1(int i) : a1(i) {} }; auto u1 = unique_ptr<S1> { new S1(1) };
// 998 more!

vs this

struct S0 { int a0; S0(int i) : a0(i) {} }; auto u0 = new S0(0);
struct S1 { int a1; S1(int i) : a1(i) {} }; auto u1 = new S1(1);
// 998 more!

// also
delete u0;
delete u1;
// et cetera

The results:

1000_structs.cpp:       1303ms / 79kb
1000_unique_ptrs.cpp:   6668ms / 185kb

Wow! It looks like each unique_ptr<> costs about 5ms and 100 bytes.

First, let’s look at compile speed. I wonder if it has to do with the number of unique (heh) unique_ptr<> instantiations, or mere utterance of the type name. If we change all the unique_ptr<>s so they have the same type, we get:

1000_identical_unique_ptrs.cpp: 599ms / 79kb

Wait, what? Why is it faster than doing it the hard way? Shouldn’t our build times be worse because we’re asking it to expand a bunch of extra templates?

Also note that file size matches up with what we get when we deallocate explicitly: Our executable gets larger with the number of kinds of unique_ptr<>s we instantiate, but it doesn’t cost anything to use the same kind of pointer many times. This makes sense: the full implementation shouldn’t be much more than a deleted copy constructor, a move constructor, and a destructor.

Could it be that all those delete statements cost 120ms? What happens if we remove them?

1000_struct_no_free.cpp: 667ms / 63kb

This is unexpected: a very boring, monomorphic built-in language construct costs more to use than a template class.

We still need to look into the code size:

$ clang++ -S -Os -std=c++11 1000_unique_ptrs.cpp
$ emacs 1000_unique_ptrs.cpp

I see this over and over:

    .private_extern __ZNSt3__110unique_ptrI2S1NS_14default_deleteIS1_EEED1Ev
    .globl  __ZNSt3__110unique_ptrI2S1NS_14default_deleteIS1_EEED1Ev
    .weak_def_can_be_hidden __ZNSt3__110unique_ptrI2S1NS_14default_deleteIS1_EEED1Ev
    .align  1, 0x90
__ZNSt3__110unique_ptrI2S1NS_14default_deleteIS1_EEED1Ev: ## @_ZNSt3__110unique_ptrI2S1NS_14default_deleteIS1_EEED1Ev
    .cfi_startproc
## BB#0:
    pushq   %rbp
Ltmp7:
    .cfi_def_cfa_offset 16
Ltmp8:
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
Ltmp9:
    .cfi_def_cfa_register %rbp
    movq    %rdi, %rax
    movq    (%rax), %rdi
    movq    $0, (%rax)
    testq   %rdi, %rdi
    je  LBB1_1
## BB#2:                                ## %_ZNKSt3__114default_deleteI2S1EclEPS1_.exit.i.i
    popq    %rbp
    jmp __ZdlPv                 ## TAILCALL
LBB1_1:                                 ## %_ZNSt3__110unique_ptrI2S1NS_14default_deleteIS1_EEED2Ev.exit
    popq    %rbp
    retq

FYI, the tail call at the end is the global operator delete:

$ c++filt __ZdlPv
operator delete(void*)

It looks like the destructor isn’t being inlined. That’s a shame. I wasn’t able to coerce clang into inlining it. (it already has the __always_inline__ attribute)

Recap

You pay a tiny bit of constant overhead just to #include <memory>
You pay a bit for each distinct specialization of unique_ptr<>, but it’s cheaper than what you pay for an explicit delete statement.
You pay a bit of filesize for each kind of unique_ptr<>.
It’s basically free to talk about many unique_ptr<>s of the same type.

Code

Haskell at IMVU

2014-03-25T00:00:00+00:00

This is a copy of the article I wrote for the IMVU Engineering Blog.

Since early 2013, we at IMVU have used Haskell to build several of the REST APIs that power our service.

When the company started, we chose PHP as our application server language, in part, because the founders expected the website to only be a small part of the business! IMVU was primarily about a downloadable 3D client. We needed “a website or something” to give users a place to download our client from, but didn’t expect it would have to be much more than that. This shows that predicting the future is hard. Years later, we have quite a lot of customers, and we primarily use PHP to serve them. We’re big enough that we run multiple subteams on separate initiatives at the same time. Performance is becoming important to us not just because it matters to our customers, but because it can easily make the difference between buying 4 servers and buying 40 servers to support some new feature.

So, early in 2012, we found ourselves ready to look for an alternative that would help us be more rigorous. In particular, we were ready for the idea that sacrificing a tiny bit of short term, straight-line time to market might actually speed us up in the long run.

How We Got Here

I started learning Haskell in my spare time in part because Haskell seems like the exact opposite of PHP: Natively compiled, statically typed, and very principled.

My initial exploration left me interested in evaluating Haskell at real scale. A year later, we did a live-fire test in which we taught multiple teammates Haskell while delivering an important new feature under a deadline.

Today, a lot of our backend code is still driven by PHP, but we have a growing amount of Haskell that powers newer features. The process has been exciting not only because we got to actually answer a lot of the questions that keep many people from choosing not to try Haskell, but also because it’s simply a better solution.

The experiment to start developing in Haskell took a lot of internal courage and dedication, and we had to overcome a number of, quite rational, concerns related to adopting a whole new language. Here are the main ones and how they worked out for us:

Scalability

The first thing we did was to replace a single service with a Haskell implementation. We picked a service that was high-volume but was not mission critical.

We didn’t do any particular optimization of this new service, but it nevertheless showed excellent performance characteristics in the field. Our little Haskell server was running on a pair of spare servers that were otherwise set for retirement, and despite this, each machine was handling about 20x as many requests as one of our high-spec PHP servers could manage.

Reliability

The second thing we did was to take our hands off the Haskell service and leave it running until it fell over. It ran for months without intervention.

Training

After the reliability test, we were ready to try a live fire exercise, but we had to wait a bit for the right project. We got our chance in early 2013.

The rules of the experiment were simple: Train 3 engineers to write the backend for an important new project and keep up with a separate frontend team. Most of the code was to be new, so there was relatively little room for legacy complications.

We very quickly learned that we had also signed up for a lot of catch-up work to bring the Haskell infrastructure inline with what we’ve had for years in PHP. We were very busy for awhile, but once we got this infrastructure out of the way, the tables turned and the front-end team became the limiting factor.

Today, training an engineer to be productive in our Haskell code is not much harder than training someone to be productive in our PHP environment. People who have prior functional programming knowledge seem to find their stride in just a few days.

Testing

Correctness is becoming very important for us because we sometimes have to change code that predates every current developer. We have enough users that mistakes become very costly, very quickly. Solving these sorts of issues in PHP is sometimes achievable but always difficult. We usually solve them with unit tests and production alerts, but these approaches aren’t sufficient for all cases.

Unit tests are incredible and great, but you’re always at the mercy of the level of discipline of every engineer at every moment. It’s easy to tell your teammates to write tests for everything, but this basically boils down to asking everyone to be at their very best every day. People make mistakes and things slip through the cracks.

When using Haskell, we actually remove an entire class of defects that we have to write tests for. Thus, the number of tests we have to write is smaller, and thus there are fewer cases we can forget to write tests for.

We like unit testing and test-driven development (TDD) at IMVU and we’ve found that Haskell is better with TDD, but also that TDD is better with Haskell. It takes fewer tests to get the same degree of reliability out of Haskell. The static verification takes care of quite a lot of error checking that has to be manually implemented (or forgotten) in PHP. The Haskell QuickCheck tool is also a wonderful help for developers. The way Haskell separates pure computations from side effects let us build something that isn’t practical with other languages: We built a custom monad that lets us “switch off” side effects in our tests. This is incredible because it means that trying to escape the testing sandbox breaks compilation. While we have had to fight intermittent test failures for eight years in PHP (and at times have had multiple engineers simultaneously dedicated to the problem of test intermittency,) our unit tests in Haskell cannot intermittently fail.

Deployment

Deployment is great. At IMVU, we do continuous deployment, and Haskell is no exception. We build our application as a statically linked executable, and rsync it out to our servers. We can also keep old versions around, so we can switch back, should a deployment result in unexpected errors.

I wouldn’t write an OS kernel in it, but Haskell is way better than PHP as a systems language. We needed a Memcached client for our Haskell code, and rather than try to talk to a C implementation, we just wrote one in Haskell. It took about a half day to write and performs really well. And, as a side effect, if we ever read back some data we don’t expect from memcached (say, because of an unexpected version change) then Haskell will automatically detect and reject this data.

We’ve consistently found that we unmake whole classes of bugs by defining new data types for concepts to wrap primitive types like integers and strings. For instance, we have two lines of code that say that “customer IDs” and “product IDs” are represented to the hardware as numbers, but they are not mutually convertible. Setting up these new types doesn’t take very much work and it makes the type checker a LOT more helpful. PHP, and other popular dynamic server languages like Javascript or Ruby, make doing the same very hard.

Refactoring is a breeze. We just write the change we want and follow the compile errors. If it builds, it almost certainly also passes tests.

Not All Sunshine and Rainbows

Resource leaks in Haskell are nasty. We once had a bug where an unevaluated dictionary was the source of a space leak that would eventually take our servers down. We also ran into an issue where an upstream library opened /dev/urandom for randomness, but never closed the file handle. These issues don’t happen in PHP, with its process-per-request model, and they were more difficult to track down and resolve than they would have been in C++.

The Haskell package manager, Cabal, ended up getting in the way of our development. It lets you specify version ranges of particular packages you want, but it’s important for everyone on the team to have exactly the same versions of every package. That means controlling transitive dependencies, and Cabal doesn’t really offer a way to handle this precisely. For a language that is so very principled on type algebra, it’s surprising that the package manager doesn’t follow suit regarding package versioning. Instead, we use Cabal for basic package installation, and a custom build tool (written in Haskell.)

Hiring

I’ll admit that I was very worried that we wouldn’t be able to hire great people if our criteria was expertise in an uncommon language without a comparatively sparse industrial track record, but the honest truth is that we found a great Haskell hacker in the Bay area after about 4 days of looking.

We had a chance to hire him because we were using Haskell, not in spite of it.

Final Thoughts

While it’s usually difficult to objectively measure things like choice of programming language or softwarestack, we’re now seeing fantastic, obvious productivity and efficiency gains. Even a year later, all the Haskell code we have runs on just a tiny number of servers and, when we have to make changes to the code, we can do so quickly and confidently.

Literate Haskell and Jekyll

2014-03-23T00:00:00+00:00

I think I’ve more or less decided to switch my blog over to use Github Pages and Jekyll. It’s pretty neat and it means I can sleep safer knowing that I’m not inadvertently inflicting unpatched PHP on some poor unsuspecting web host.

One of the things that’s kind of annoying, though, is that Jekyll won’t correctly handle Literate Haskell out of the box. You can drop code into a Markdown document and it will even syntax highlight it, but it doesn’t support any syntax that also happens to line up with Literate Haskell.

Clearly, this is a problem in need of a self-referential solution.

{-# LANGUAGE OverloadedStrings #-}
module Main where

import System.IO (stdin, stdout, hIsEOF)
import Control.Monad (when, unless)
import qualified Data.Text as T
import Data.Text.IO (hGetLine, hPutStrLn)

We run a very basic state machine: each line is either part of a Literate Haskell block, or it isn’t.

data State = LHS | Prose
    deriving (Eq)

processNextLine :: State -> IO ()
processNextLine state = do
    eof <- hIsEOF stdin

    -- UPDATE 3 April 2014: My program had a bug! It would not close the last
    -- Markdown group if the last line of the input program was Haskell code and
    -- not prose.
    when (eof && state == LHS) $
        hPutStrLn stdout "```"

    unless eof $ case state of
        LHS   -> processLhsLine
        Prose -> processProseLine

When looking at prose, all we need to do is watch out for lines that have bird tracks. If it’s anything else, spit the line out verbatim.

processProseLine :: IO ()
processProseLine = do
    line <- hGetLine stdin
    if hasBirdTrack line
        then
            switchToLhs line
        else do
            hPutStrLn stdout line
            processNextLine Prose

LHS is pretty much identical, but we need to strip the bird track from each line as we read it.

processLhsLine :: IO ()
processLhsLine = do
    line <- hGetLine stdin
    if hasBirdTrack line
        then do
            hPutStrLn stdout (stripBirdTrack line)
            processNextLine LHS
        else
            switchToProse line

To flip between states, print out a line that signifies Haskell or not-Haskell, and resume in the new state.

It’s kind of funny that we call this sort of recursion “functional.” In this program, it looks an awful lot like “goto.” :)

switchToLhs :: T.Text -> IO ()
switchToLhs line = do
    hPutStrLn stdout "```haskell"
    hPutStrLn stdout (stripBirdTrack line)
    processNextLine LHS

switchToProse :: T.Text -> IO ()
switchToProse line = do
    hPutStrLn stdout "```"
    hPutStrLn stdout line
    processNextLine Prose

hasBirdTrack :: T.Text -> Bool
hasBirdTrack line = ">" == line || "> " == T.take 2 line

stripBirdTrack :: T.Text -> T.Text
stripBirdTrack line = T.drop 2 line

main :: IO ()
main = processNextLine Prose

And that’s it!

Source

Using C++ Macros to Inline Repetitious Code

2009-02-11T00:00:00+00:00

One of the neat things the IMVU client has is balls-awesome crash handling. We do a number of things to ensure that, any time our client crashes for any reason, that we find out as much as we can about them. This serves a bunch of purposes: we use the raw number of crashes to determine whether or not we have broken something, and we use all the information we can get out of the crashes (stack traces, memory dumps, log files, and the like) to fix them.

One of the tricks to testing crash handling is that you can’t really do it without crashing. :) To that end, we have a large array of functions that exist only to crash our client. Moreover, we have parameterized them on the context in which we would like the crash to occur: with Python on the stack, without, from a WndProc, from within an exception handler, and so forth

This would be a terrible source of duplicated code, but for a satanic little trick that you can do with the C preprocessor. Since I’m such a nice guy, I’ll even tell you what it is right up front, so you can skip the rest of this post if you want:

Macros can be macro arguments.

Here’s how it works:

First, we’re going to define a list of crashes, using the C preprocessor:

#define CRASH_LIST(F)               \
    F(WriteToNull)                  \
    F(ReadFromNull)                 \
    F(JumpToNull)                   \
    F(CallPureCall)                 \
    F(ThrowFromExceptionHandler)    \
    F(DestroySun)

Weird, right? Right. Now, the first thing we have to do with all of these crashes, is declare functions in a header file:

#define DECLARE_CRASH(T)       \
    void T();

CRASH_LIST(DECLARE_CRASH)

Next, we need to declare functions that crash within a WndProc:

#define DECLARE_WNDPROC_CRASH(T)       \
    void T ## InWndProc();

CRASH_LIST(DECLARE_WNDPROC_CRASH)

Then, we need to implement the crashes themselves:

void WriteToNull() {
    *((int*)0) = 1234;
}

/* etc */

Yeah, we can’t use the fun macro trick this time. :( You can’t win them all, I guess. Next, crashing in WndProc:

HWND createCrashWindow() { /*this is boring win32 gook */ }

LRESULT crasherWndProc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam) {
    if (msg == IMVU_WM_CRASH) {
        void (*crashFunc)() = reinterpret_cast<void (*)()>(wParam);
        crashFunc();
        return 0
    } else {
        return DefWindowProc(hWnd, msg, wParam, lParam);
    }
}

void crashInWindow(void (*crashFunc)()) {
    HWND hWnd = createCrashWindow();
    PostMessage(hWnd, IMVU_WM_CRASH, reinterpret_cast<WPARAM>(crashFunc), 0);
}

#define IMPLEMENT_WNDPROC_CRASH(T)  \
    void T ## InWndProc() {         \
        crashInWindow(&T);          \
    }

CRASH_LIST(IMPLEMENT_WNDPROC_CRASH)

And lastly, we need to be able to call all of these functions from Python:

#define IMPLEMENT_BOOST_PYTHON_CRASH(T)    \
    def(#T, &T);                        \
    def(#T "InWndProc", &T ## InWndProc);

CRASH_LIST(IMPLEMENT_BOOST_PYTHON_CRASH)

Now, I’ll be the first one to wail in terror at how theoretically terrible the C preprocessor is, and what it does to maintainability, but, in this case, at least, the payoff is undeniable: with very small effort, and without having to build some kind of “CrashRegistry” framework, we can add additional crash cases to our client.

Ghetto Closures in C++ III: Templates and Traits and Interfaces, oh my!

2009-02-10T00:00:00+00:00

Last time, we went over generating a tiny ASM thunk that could wrap a C++ method pointer (instance plus function) up in a non-method __stdcall function pointer, suitable for use as, say, a win32 WndProc. Next, I’m going to talk about wrapping it all up in a convenient API. To do this, we’re going to need some template-fu.

Since I am using a modern compiler, I am going to see how close I can get to boost.function‘s interface:

boost::function<void ()> fn = &someFunction;

I consider this interface to be pretty rad.

boost::function works with only a single template argument, so we could go that route too. We could also accept that we’re doing something a bit different, and add a second parameter:

Thunk<LRESULT (Window::*)(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam)> wndProcThunk;

Thunk<Window, LRESULT (HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam)> wndProcThunk;

I picked the first one, mostly because it was the first thing that popped into my head. Here is my test harness:

#include <cstdio>
using std::printf;

struct I {
    virtual void print() = 0;
    virtual void printSum(double y) = 0;
};

struct C : I {
    C(int x)
        : x(x)
    { }

    void print() {
        printf("My x is %in", x);
    }

    void printSum(double y) {
        printf("%i + %f = %fn", x, y, x + y);
    }

    int x;
};

void main() {
    C instance(4);

    Thunk<void (I::*)()> thunk(&instance, &I::print);
    printf("n");
    thunk.get()();

    Thunk<void (I::*)(double)> thunk2(&instance, &I::printSum);
    printf("n");
    thunk2.get()(3.14);
}

First, I extracted the part of the code that actually constructs and cleans up the generated code. Everything else I’ll outline will just be scaffolding so that the interface is prettier.

template <typename D, typename S>
D really_reinterpret_cast(S s) {
    char __static_assert_that_types_have_same_size[sizeof(S) == sizeof(D)];

    union {
        S s;
        D d;
    } u;

    u.s = s;
    return u.d;
}

template <typename C, typename M>
void* createThunk(C* instance, M method) {
    char code[] = {
        0xB9, 0, 0, 0, 0,   // mov ecx, 0
        0xB8, 0, 0, 0, 0,   // mov eax, 0
        0xFF, 0xE0          // jmp eax
    };

    // YEEHAW
    *((I**)(code + 1)) = instance;
    *((void**)(code + 6)) = really_reinterpret_cast<void*>(method);

    void* thunk = VirtualAlloc(0, sizeof(code), MEM_COMMIT, PAGE_EXECUTE_READWRITE);
    memcpy(thunk, &code, sizeof(code));
    FlushInstructionCache(GetCurrentProcess(), thunk, sizeof(code));

    return thunk;
}

void releaseThunk(void* thunk) {
    VirtualFree(thunk, 0, MEM_RELEASE);
}

To be honest, this interface isn’t all that bad: all you have to do is remember to manage the lifetime of the generated code and cast the void* you get to the right type. That’s kind of boring, though, so let’s instead see if we can make something kickass and typesafe.

I like objects, so let’s start with one of those.

template <typename M>
struct Thunk {
    typedef M Method;
    typedef typename methodptr_traits<M>::class_type Class;
    typedef typename methodptr_traits<M>::function_type Function;

    Thunk(Class* instance, Method method)
        : ptr(createThunk(instance, method))
    { }

    ~Thunk() {
        releaseThunk(ptr);
    }

    Function get() const {
        return reinterpret_cast<Function>(ptr);
    }

private:
    Thunk(const Thunk&);

    void* ptr;
};

This is simple enough to be boring, except for this methodptr_traits thing.

methodptr_traits is an instance of something called a traits class. Basically, it is a fancy template type that defines various other types. You can think of it as a way to code ad-hoc, compile-time type introspection.

If you’ve never used templates this way, the implementation is pretty intimidating:

template <typename F>
struct methodptr_traits;

template <typename ReturnType, typename T>
struct methodptr_traits<ReturnType (T::*)()> {
    typedef T class_type;
    typedef ReturnType (__stdcall *function_type)();
};

This is one of the more convoluted things one can do with templates, and I’ll be the first to admit that I think it’s a bit scary. Let’s rewind a bit and look at this in simpler terms. Say we want a boolean variable that’s true if a particular template type is a number. We can use template specialization to accomplish this pretty easily:

template <typename T> struct is_int { enum {value = false}; };
template <> struct is_int<int> { enum {value = true}; };
template <> struct is_int<short> { enum {value = true}; };
template <> struct is_int<char> { enum {value = true}; };
// and so on

I might leverage this code with something like the following:

if (is_int<T>::value) { /* stuff */ }

and be off to the races.

Cool, right? Now what if we instead wanted to know whether something is a std::vector, whatever the element type? The same principle applies, but now we have a template specialization that is itself a template:

template <typename T> struct is_vector { enum {value=false}; };
template <typename E> struct is_vector<std::vector<E> > { enum {value=true}; };
How to use this should be obvious:

if (is_vector<T>::value) { /* do something that only works on std::vector */ }

methodptr_traits is just a tiny jump further. Most of the terror that this sort of thing inspires is really the fault of C++’s ridiculous function pointer syntax.

It is kind of a drag that C++ templates cannot express functions without specifying exactly how many arguments the function has. Because of this, a new specialization must be written for each argument count you want to support. I only did 0 and 1 arguments because this is just a small example. boost tends to support a minimum of 10 arguments by default, which is good enough for almost everyone.

For this example, I only need 0 and 1 argument, so here’s the specialization for a one-argument function:

template <typename ReturnType, typename T, typename Arg1>
struct methodptr_traits<ReturnType (T::*)(Arg1)> {
    typedef T class_type;
    typedef ReturnType (__stdcall *function_type)(Arg1);
};

And that’s it! With a single templatized class, we can dynamically generate an assembly thunk that works as a perfectly usable __stdcall function. We can pass this function pointer on to Win32, GLU, or whatever other C library we might need a callback function for.

Ghetto Closures in C++ II: __thiscall

2009-02-09T00:00:00+00:00

Today, we’re going to extend our little closure library to support the __thiscall calling convention.

After some poking around, I have discovered that some guy has figured this out already. Rad! I’m going to go over it quickly anyway so that I can build on it for tomorrow.

First, the setup/demo part changes a bit:

struct I {
    virtual void print() = 0;
};

struct C : I {
    C(int x)
        : x(x)
    { }

    void print() {
        printf("My x is %in", x);
    }

    int x;
};

typedef void (__stdcall *Function0)();

It turns out that __thiscall and __stdcall are not all that different. All you need to do is stuff your this pointer in ECX, and you’re set. Our satanic little blob of assembly changes to:

__asm {
    mov ecx, my_this        // B9 xx xx xx xx
    mov eax, real_func      // B8 yy yy yy yy
    jmp eax                 // FF E0
}

As before, we just have to create a block of memory, plug the code and pointers into it, and execute it:

// YEEHAW
*((I**)(code + 1)) = instance;
*((void**)(code + 6)) = really_reinterpret_cast<void*>(&I::print);

void* buffer = VirtualAlloc(0, sizeof(code), MEM_COMMIT, PAGE_EXECUTE_READWRITE);
memcpy(buffer, &code, sizeof(code));
FlushInstructionCache(GetCurrentProcess(), buffer, sizeof(code));

Function0 f0 = reinterpret_cast<Function0>(buffer);

really_reinterpret_cast is a hack because we are doing Very Bad Things. The C++ standard says that you cannot convert pointer-to-methods to any other kind of pointer (not even void*!). I was unable to dig up the exact reasoning, but I think it is because the C++ implementation is allowed to make pointer-to-members have any format it wants. They don’t even have to be the same size as a normal pointer.

But that’s boring. :D I doubt Microsoft will change how this works any time soon now that existing code depends on it, and there is potential here to do something that is very useful.

It turns out that the unspecified implementation that they chose was for a pointer-to-member-function to either point to the code directly (like a stdcall function), or to point to a thunk that will go to the right place, if the method is virtual. In other words, we can just jump to it and we will be jumping to the right place.

Here’s my evil, standard-subverting cast:

template <typename D, typename S>
D really_reinterpret_cast(S s) {
    char __static_assert_that_types_have_same_size[sizeof(S) == sizeof(D)];

    union {
        S s;
        D d;
    } u;

    u.s = s;
    return u.d;
}

That weird looking char array is an ad-hoc compile-time assertion. A C array of length 0 is illegal, so if sizeof(S) != sizeof(D), then the compile will fail.

And, as before, the sweet thrill of victory:

Function0 f0 = reinterpret_cast<Function0>(buffer);
f0();

woo.