Guiding With Types, Property Based Testing and Dependency Resolution

Table of Contents

Guiding With Types

I spend most of my free time in Ocaml and part of the draw is the type system. It's powerful but simple. A lot of the discussion I see around types is about whether or not they are useful for reducing the bug count in a program. Another place I have been finding types valuable is guiding development with types. In my experience, types combined with a desired behaviour of the code can get one pretty far in writing correct software. In particular, if one can track down which functional abstraction a problem falls into, the types can act as a landmark, making sure the code is in going in the right direction.

One difference between, what I'll call, functional patterns and the kind commonly seen in imperative or OOP languages is that the functional patterns often have a type associated with them. Examples of this are functor, applicative, and monad. To contrast, some common patterns in imperative languages are the Strategy pattern or the Observer pattern or the Visitor pattern. The documentation around these patterns has some UML, or similar, drawings but nothing you can feed into a compiler. That doesn't make them better or worse than the functional patterns, like everything it depends on the situation. But the great thing about having a type I can give my compiler is that often if I can get the code to compile there is a good chance it's pretty correct.

I'm building an applicative monadic concurrency library in Ocaml right now. Knowing it is applicative and monadic, the first thing I did was specify the types of the functions:

val return : 'a -> 'a t
val map : ('a -> 'b) -> 'a t -> 'b t
val app : ('a -> 'b) t -> 'a t -> 'b t
val bind : 'a t -> ('a -> 'b t) -> 'b t

It turns out, the intersection of functions that one could write that matches those types and those that actually make sense is pretty small and just compiling goes a long ways towards being correct. Take map, for example, it's clear that the function passed in can't give me any interesting information as it takes, and returns, unknown types. All I can do is take a value out of the t type, give it to the function, and pack it back into a t type. The app and bind functions are similar games I have to play, where I don't know what the function does, just a little bit about the types involved. I'm taking values out of the t type and putting values back in to it in different ways.

Knowing that my code matches these types also gives me information about how I can compose them together to make larger programs. At that point though, one needs to do some testing. Ocaml's type system isn't powerful enough (as far as I know) to express what should happen when composing these operations together. But the combination of writing code that matches the known types of the abstraction and the code making sense gets me pretty far before having to test.

That's where I think a lot of the benefits of types is and I don't see it getting as much appreciation as it probably should. I think this is why a lot of functional programmers really value figuring out what kind of abstraction their problem falls into. Once you know that, you can write down the types and starting solving it and the compiler will guide you.

Property Based Testing a Bencoder Serializer

Property based testing is one of the few forms of testing I kind of enjoy doing. The original tool to do it is called Quickcheck an many of the derivative tools have a similar name. I have only used Quickcheck for Erlang, Ocaml and Python (the Python one, Hypothesis is probably the nicest I've seen for a non-functional language). The gist of property based testing is that the testcase is a function which takes a random instantiation of the parameters of the test and asserts something is true about an operation performed on the input. This thing that is being asserted is the property. So instead of a unit test which asserts something is true for a particular input, property based tests assert something is true for all inputs.

The Hello World of prop testing is encoding and decoding a value. The property being tested is that encoding and decoding the value equals the original value. Something like the below example where value is automatically generated by the test framework:

def test_encoding(value):
    assert (decode(encode(value)) == value)

I am implementing a Bittorrent client to test the concurrency library mentioned in the above article and Bittorrent encodes data in this simple format called bencode. Bencode contains 4 data types: integers, strings, lists, and dictionaries where the key is a string. I wanted to make a property test for my encoding and decoding code. In this case I wanted to test something slightly different than the test_encoding function given above. I also want to test that the encoding is deterministic, this is because bencoding requires that encoded representation of dictionaries are sorted by the key. The test will look more like:

def test_encoding(value):
    s = encode(value)
    assert (encode(decode(s)) == s)

Before continuing, I want to be clear on what is not being tested by this:

  • I am not testing that the bencoded output is correct.
  • I am not testing that the dictionary keys are actually sorted.
  • I am not testing that the decoded value is the same as value, the input.
  • I am not testing that I handle malformed bencoded strings, which could be an attack vector by a malicious individual.

What I am testing is that the output is deterministic and that there are no values which can not be encoded or decoded. The generated value will cover a wide range of inputs which will exercise the encoding and decoding logic pretty heavily. And that's what I want: exercising as many cases in the serializer as possible.

The reason this is a valuable way of testing is there are many inputs that could make the serializer fail that I could easily forget to enumerate in individual unit tests. A simple example is negative numbers, many people only think about positive numbers and forget to make test cases for negative values. Or strings with non-printable characters in them. Property based testing is not as powerful as doing a formal proof of the correctness of the serializer however it's way more powerful than trying to enumerate all unit tests. It sits somewhere between unit testing and formal proofs. The tests are definitely a bit trickier to write than unit tests but it's like writing a thousand unit tests for a fraction of the time spent.

Back to testing my bencoder. My bencoder is written in Ocaml and I will be using the QCheck library. The test for this is actually quite easy to write:

fun view ->
  let t1 = Bencode.of_view view in
  let b1 = Bencode.to_bytes t1 in
  let t2 = snd (Bencode.of_bytes b1) in
  let b2 = Bencode.to_bytes t2 in
  b1 = b2

As input to the test, I take value called view. My bencoder library has two in-memory representations. The primary one is an opaque type which means a user of the library cannot access the contents of the value directly. In order to be able to access the decoded value, it is converted to a view (the type is given below), which is a type users of the library can see and use directly. This is a convenient pattern for decoupling the internal representation from the one users will interact with. For example, if it turned out that most bencoded strings are not actually used, I could make the internal representation the encoded string and decode it only when I convert it to a view, saving me from decoding strings that don't need to be. Disirregardlessly, this means that the value QCheck will construct for the test is of the view type, so the test first turns the view into the internal representation with the of_view function. Then the internal representation is encoded into bytes, then the bytes decoded to the internal representation, then that is encoded back to bytes and I test that the bytes match.

module View : sig
  type t =
    | String of string
    | Int of Num.num
    | Dict of t Map.Make(String).t
    | List of t list

The test case is quite simple and wouldn't be worth writing an article about if it were just that. The challenge is how to construct a view. To do that, QCheck provides some helpful combinators to make a generator for an arbitrary type. A generator is a function that produces random instances of some value on each invocation.

Making a generator for a scalar is pretty straight forward. So I'll start with generators for integers and strings:

let bencode_int =
    (fun i -> Bencode.View.(Int (Num.num_of_int i)))

let bencode_str =
    (fun s -> Bencode.View.(String s))

These make use of the map combinator in the QCheck.Gen module. What these do is use the int and string generators that come with QCheck, and pass the value those generate into the function, which then converts them to a View.t. Now bencode_int and bencode_str are generators and can be invoked (through another function in QCheck) to produce random values.

Lists and dictionaries are different then scalars, though. They are recursive data structures. When generating a list, I need to generate the things it contains, which could be another list, or a dictionary. And the same is true of dictionaries. Our generator needs to be able to reference itself to generate the next layer of values when it has generated a list of dictionary. On top of that, because this is all random, I want to make sure it doesn't generate structures infinitely deep (list of list of list to infinity).

Before making the generator, I want two helper functions. The first will take a list of View.t's and turn it into a view. The second will take a list of pairs of string and View.t and turn them into a view.

let make_list l = Bencode.View.(List l)

let make_dict pairs =
  let m =
      ~f:(fun acc (k, v) -> String_map.add k v acc)
  Bencode.View.(Dict m)

Now for a whole bunch of concepts in one go. The requirements for the generator are that it needs to be self-referential, it needs to be bounded, and it needs to generate an arbitrary View.t. Being self referential is done with the fix combinator and bounding a generator is done with the sized combinator. sized will generate a random positive number and then fix takes a function which takes a reference to itself and the generated size and returns a generator. What this function will do is if the size is 0, generate an int or string. If the size is larger, still maybe generate an int or string or also maybe generate a list or dictionary and call this function again with a smaller size. With this, eventually the size will be zero where the generator can only generate a scalar.

let bencode_gen =
    sized @@ fix
      (fun self n ->
        (* Limit the depth to 4 *)
        match min n 4  with
          | 0 ->
            oneof [bencode_int; bencode_str]
          | n ->
              [ (1, oneof [bencode_int; bencode_str])
              ; (2, map
                         (int_range 0 10)
                         (self (n - 1))))
              ; (3, map
                         (int_range 0 10)
                         (pair string (self (n -1)))))

The added bit here is frequency, which takes how frequent other generators should be used and calls them with the desired frequency. So in this case, when the size, n, is greater than 0, there is a 1/6th chance of generating a scalar, a 1/3rd chance of generating a list, and a 1/2 chance of generating a dictionary.

For the case of the list, list_size generates a list between 0 and 10 elements long and the elements in the list are the result of calling self (n - 1). Then make_list is called with the generated list.

In the case of the dictionary, list_size is used again, except rather than the elements just being a call to self its a pair of string, which will be the key, and a call to self (n - 1) for the value. Then, again, map is used to turn that list of pairs into a View.t.

Finally, the min n 4 is there because this generator can use a lot of memory so the depth is limited to 4.

Now to put it all together:

let bencode_prop =
    (QCheck.make bencode_gen)
    (fun view ->
      let t1 = Bencode.of_view view in
      let b1 = Bencode.to_bytes t1 in
      let t2 = snd (Bencode.of_bytes b1) in
      let b2 = Bencode.to_bytes t2 in
      b1 = b2)

  ~rand:(Random.State.make_self_init ())

Great, this will run 100 randomly generated serialization tests.

There is one important missing element in this test: I have not specified how to pretty print a View.t. The QCheck framework has a way to specify how to print and generated value but I haven not implemented it. I'll leave that as an exercise to the reader.

Property based testing is great and with a little bit of work I've gotten a test that covers a lot of possibilities in my code. This test does not cover every possibility, though, and I am using it with unit tests that cover specific scenarios. I have crafted a few bencoded strings and decode them to make sure they work and the correct tokens are being used to specify the type of an encoded value. Those unit tests tell me that the basic encoding/decoding works as expected, then the property test tries a wide range of values and tells me that there is a very small chance I have missed an edge case.

Dependencies Should Be Pinned

Developing a software project almost always requires using other libraries and tools. Unless one is using a monorepo, it is likely that those dependencies will be developed independently of the project and included through a dependency management tool. Most languages have their own dependency managers these days, for example maven, rebar and opam. With dependencies comes the problem of determining which version of a dependency to use. A large project will most likely have at least one dependency that is specified multiple times. For example, project P has dependencies D1 and D2, both of which depend on D3. If D1 and D2 specify different versions of D3 to be used, determining which version of D3 to pick is called dependency resolution and has been the source of several headaches throughout the ages.

Given that most languages require a single version of a dependency, there are a number of ways dependency managers attempt to deal with the possible conflict automatically. Unfortunately, almost all of them are broken in some way and create problems. Below are some of the solutions I've come across or wish I had.

Conflict Strategies

Simple yet effective: Pinning

Pinning is where all dependencies are assigned a single, concrete, version number.

In my experience, every dependency resolution strategy comes down to someone having to swing the pinning hammer anyways. Pinning is very simple but does have some overhead as every dependency needs have its version specified somewhere. The benefit is that there is never any question as to which dependency will be used in a project. This is the solution that the Nix package manager uses. This is what a monorepo boils down to.

Pick A Random One: Rebar

This is the rebar solution (at least in rebar2). When fetching dependencies, rebar would non-deterministically pick one of the versions. This solution is obviously completely wrong. Non-determinism in the test and build process of a project brings new meaning to "works for me". In the case of rebar, I don't think this strategy was a decision, but rather the result of applying a simple recursive function to fetch dependencies of a project. The dependencies were probably stored in an unordered data structure. On top of that, if dependencies were fetched in parallel, delays in how long it took to download one dependency and move on to the next was non-deterministic.

In the project I worked on where we discovered this we noticed it because sometimes a build would fail saying a function did not exist. It turned out that different major versions of a dependency were randomly winning the lottery and sometimes we got the version that didn't have the API we needed. The downside is that it was at least a whole morning of confusion as to what was going on. The upside is that the error presented itself at build time and could be debugged before going out in production. Who knows how many releases of our service we did before that which had subtler bugs.

Nearest To Me: Maven

In the Java world, building even the simplest program can feel like you have to download half of the Internet in dependencies. The tool of choice these days is generally maven, sometimes with something wrapping it. Maven calls handling this dependency mediation and its behaviour is documented. The policy is the dependency nearest to the project in the dependency graph gets chosen. If two dependencies are at the same distance, the behaviour up to maven 2.0.9 was the "Pick A Random One" strategy. After 2.0.9 the strategy became "first declaration wins". I assume that means the first one that shows up textually in the dependency list.

The logic behind this heuristic makes some common situations awkward. For example, my project is working just fine and I add another dependency and all of a sudden parts of the build unrelated (or so it seems) to that dependency no longer compile. To get out of these situation without rewriting anything, one can employ some gymnastics with the shade plugin. The "Nearest To Me" strategy is not obviously wrong (unless it's picking a random dependency) but it's one of those heuristics that is right all the way up until it is completely wrong.

Constraint Solving: OPAM

The Ocaml package manager, OPAM, goes for a constraint solver. Constraint solving being an NP-complete problem. That is the reason opam install can take awhile to give the list of dependencies it will install. It also doesn't always give the expected solution, multiple times I've had opam upgrade want to put me on weird versions of packages. I don't know if this is still the case because after that happening a few times, I just don't use opam upgrade anymore.

Constraint solving is the smart solution but that doesn't necessarily make it better. At the very best, it's limited by the quality of information it's given. If a package has a bad constraint on it, it will produce a bad solution. Recently I installed a newer version of the Core Suite, which failed, because Core depended on another package called ppx_deriving. There had recently been a new major version release of ppx_deriving. The Core package did not have an upper bound constraint on that package. As a result, no Core for me because of a backwards breaking API change. Luckily, OPAM lets one pin packages at the dependency manager level, overriding any constraints in the packages.

Error On Conflicting Dependencies

I don't know any systems that do this but I sure would like one. In this case, a project with conflicting transitive dependencies simply does not build and a person has to explicitly state which one should win. In reality, this is probably where every dependency manager should have started. Provide a policy of throwing its hands and yelling for help as the default and then let projects insert their own resolution policy if they see fit.

Support Multiple Library Versions

Some languages do allow multiple versions of a dependency to be included in a single project. I have no direct experience with this so I cannot speak to how well it works. One place that one does need to be careful here, though, is with data structures. If the layout of a data structure has changed between versions, it is not valid to pass an instance created by library version 1 into a function in library version 2.

Simplicity is Best and Easiest

One reason this is all such a mess is that there just is not enough information in version numbers to reason about which versions work together and which do not. Even semantic versioning just states if two versions are compatible but it could be that the subset being used in a project did not change between major versions. Automatically examining APIs is possible but not sufficient because it is, unfortunately, not possible in the general case to test two function implementations for equality.

Not only is version information limited, but humans are the ones versioning software artifacts and make mistakes. Or sometimes they value things other than a correct version. Even OPAM, a package manager, has failed to properly encode compatibility in its own versions. And don't even get me started on the 0.x.y loophole in semantic versioning. I think that the truth is, version numbers do not exist for computers to do smart things with. Instead they exist to give humans hints. Since a human is going to have to intervene eventually anyways, pinning is the only strategy that is guaranteed to work. It's like freezing a moment in time. All other strategies end up with a human pinning a version, so why not start there?

The Simple Programmer

It feels like every program I write at work is an orchestrator of some kind, whether it's doing a rolling restart of services or automating backups. I've found that for most of these problems, a state machine expresses the solution succinctly and reliably. When possible, representing the state machine as a data structure that is interpreted opens up a lot of opportunities. A map can be used to represent the state machine where the key is a state and the value is a function that returns a new state. The only other piece we need is a way to determine if a state is a terminal, which means the machine is done. An interpreter can be as simple as:

state = state_machine[START]()
while not terminal(state):
    state = state_machine[state]()

The simple interpreter does the trick but it's a little too opaque. It'd be nice if I could determine what state a transition will lead to without executing the function, then I can do some useful things with the state machine. To do that, first I need to make the values in the map something slightly less expressive than a function. The two types of operations I'll do in this state machine are an action, which I execute for the side effect, and a predicate, which lets me branch in the state machine. Then when I generate the state machine I'll pass in an object that implements an action method and a predicate method. I can then use that state machine to execute different interpreters. So creating a state machine looks something like:

def make_state_machine(sm):
    return { START: sm.predicate(test1, TEST1_TRUE, TEST1_FALSE),
             TEST1_TRUE: sm.action(do_thing, THING_DONE),
             TEST1_FALSE: sm.action(do_other_thing, OTHER_THING_DONE),

I will also need different interpreters for different implementations of sm. So I can add a run method to the sm type as well. Now I can bring the example together:

class Sm:
    def run(self, state_machine):
        state = state_machine[START]()
        while not terminal(state):
            state = state_machine[state]()

    def action(self, f, state):
        def _():
            return state
        return _

    def predicate(self, f, t_path, f_path):
        def _():
            if f():
                return t_path
                return f_path
        return _

sm = Sm()
state_machine = make_state_machine(sm)

With this little bit of abstraction I can accomplish quite a bit. I just need to define a new Sm class for those different things I'd like to do. What sort of things might those be? One of them would be to analyze the state machine to find if any states can never be reached. Another can be outputting a picture of the state machine, perhaps through graphviz. Another interpreter could output the state machine in another language. The possibilities are endless.

This pattern is really handy for orchestrators because most orchestrators are doing little more than transitioning some entity through a series of states. For the more functionally inclined, this is an OO-style subset of what can be implemented with free monads, which are a very powerful tool and worth learning about.

Updated: 2016-09-13 Tue 12:43