Ocaml: Building and Packaging

Table of Contents

Building Ocaml Projects With pds

pds is a simple build system designed for small projects. It targets Unix-like OS's at the moment and turns a TOML file into a set of Makefiles. This post is written for pds major version 5.

To get pds, install it via OPAM with: opam install pds

Why does pds exist when there are other solutions like omake and Ocamlbuild? The belief of the author is that these tools are fine but a bit too complicated or verbose for many basic projects. pds is less expressive than existing tools in the hopes that it makes the process simpler and easier. pds is meant to play well with other tools, so if someone does not like pds or has a complicated build that is part pds and part something else, it shouldn't get in the way.

There are a few reasons for Makefiles as the output. First, it simplifies pds considerably. pds only needs to know how to generate a Makefile for Ocaml builds rather than be a generic build system. Secondly, it allows users to extend a build by hooking in their own Makefiles to do the extra work. Finally, something that can execute a Makefile exists on almost all platforms, making them a fairly portable solution. The Future work section proposes some changes that could abstract more of the Makefiles way making pds more portable.

A simple pds project

A project that uses pds has two things:

  1. The code in a particular directory layout.
  2. A pds.conf file that describes the build.

When the pds program is run, it will read the pds.conf and produce several Makefiles. The entry point to these Makefiles is pds.mk, which supports a number of targets, including all, which will build everything, test and install.

Here's an example. First, create the directory structure and source files.

  1. Create a directory called pds_example and go into it.
  2. Create directories called src/example and tests/test_example.
  3. In src/example create a file called example.ml and it put let foo = 4.
  4. In tests/test_example create a file called test.ml and in it put let () = assert (Example.foo = 4).

Now with this, make pds.conf in the root directory of the project:

[src.example]
install = false

[tests.test_example]
deps = ["example"]

This pds.conf says that there is a source project called example and to not install it when the install target is run. There is a collection of tests called test_example and they depend on the example source project.

With that, the code can be built and tested:

  1. Run pds to generate the Makefiles.
  2. Run make -f pds.mk to build everything (on *BSD use gmake).
  3. Run make -f pds.mk test to run the tests. Running the test target will also build, or rebuild if necessary, anything the test depends on.

That's it for a basic setup. A few notes:

  • The pds program needs to be run before every build so it can regenerate new Makefiles based on if new source files were added or pds.conf was updated.
  • Source projects build a library or executable based on all of the source files in the directory.
  • For Ocaml projects, both native and byte code output will always be generated and tested.
  • For tests, each individual file is its own test, so tests/test_example/ could have had multiple .ml files in it and each one would be an individual test. Running the test target will compile and execute each one of those. For performance reasons it might be desirable to have multiple tests in a single executable. How to accomplish that is entirely up to which test harness you use. I generally have one test file per aspect of the thing I'm testing with several tests in that file.
  • A test is considered failed if the executable has a non-zero exit code.
  • The pds program should be run before every build operation, it will regenerate all of the Makefiles.
  • If you have changed pds.conf you might have to do make -f pds.mk clean. Currently pds does not track if a change to pds.conf requires a rebuild. Adding a new project to the configuration does not require a rebuild because it will just build the new sources.

All output is placed into the build directory which has sub-directories for the type of build the files correspond to such as build/release or build/debug.

Some more advanced things

Often there are multiple kinds of builds one wants, such as release, debug, and profile. pds supports these, although it does not apply any meaning to them, they are just build names that are available for configuration. The release build is the default build for everything and must exist. The other builds will have the same configuration as the release build unless overridden. One can override options at a global level as well as in an individual project configuration.

For example, in order to debug the example project it needs to have some debug compile options specified. To do this, add a debug section to the pds.conf, the full file will now look like:

[src.example]
install = false
debug = { extra_compiler_opts = "-g" }

[tests.test_example]
deps = ["example"]

One could also have made a separate section in the TOML called [src.example.debug]. I prefer to do it with debug = ... because it's easy to make a typo in the separate section way, which will not be detected by pds.

Now execute the debug build: make -f pds.mk debug

pds will handle a release and debug build coexisting just fine, it compiles these into separate build directories. As mentioned before, in the current released version version, 5.15, it will not handle modifying pds.conf and rebuilding properly. One will need to perform a clean and then build again to make sure everything is built with the new options.

A debug test can also be run, which will run the tests but using the debug versions of the projects. To do this run: make -f pds.mk test-debug. The same applies for profile builds as well.

Sometimes one wants default build options and for that there is a global section that this can be put in:

[global.debug]
extra_compiler_opts = "-g"

[src.example]
install = false

[tests.test_example]
deps = ["example"]

The precedence rules are that the configuration in a project always wins followed by the global section and finally by defaulting to the release build options.

A common pattern I find myself doing with globals is something like:

[global.release]
extra_compiler_opts = "-safe-string -no-assert"

[global.debug]
extra_compiler_opts = "-g -safe-string"

[src.example]
install = false

[tests.test_example]
deps = ["example"]

Of course, projects often have more configuration than just if they should be installed or not. The most common is dependencies, which are specified as a list of names which can be found through ocamlfind. For example, if the example project depended on regular expression package, re, to build, it would look like:

[src.example]
install = false
deps = ["re"]

[tests.test_example]
deps = ["example"]

Again, be sure to run pds first, then make -f pds.mk clean then make -f pds.mk to build. This is only necessary if you didn't change a source code file as well. If a source code file has been changed to now use the regular expression module and the pds.conf was updated at the same time, the source will recompile just fine.

The deps option can also refer to internal packages, like in the case of tests.test_example, pds will determine which deps are internal and external and create the correct build order.

Formatted pds.conf

A final feature that pds has is the -f option, which stands for "formatted". Running pds -f in a project will output a simplified view of the pds.conf. The output is tab separated with the following columns:

Column Index Description
1 Type of build (src or test)
2 Name of project
3 Build type (library, exec, empty for third-party)
4 Project type (ocaml, third-party)
5 Comma separated list of dependencies

This feature is useful for simple interpretation of a pds configuration. An example of this is merlin-of-pds which is a simple awk script to take the formatted output and create a .merlin file.

Some other things

  • pds supports documentation builds. For Ocaml project it, by default, only looks at *.mli files but this can be modified. In order to build the documentation, run make -f pds.mk docs.
  • My projects have a Makefile in them that looks like below, it executes pds on every build. With this Makefile I can just do make docs or make test. It will also generate the .merlin file on every run:
.PHONY: .merlin

all: .merlin
	pds
	$(MAKE) -f pds.mk all

%: .merlin
	pds
	$(MAKE) -f pds.mk $*

.merlin:
	 pds -f | merlin-of-pds > .merlin
  • It's safe to execute pds builds in parallel with the -j option.

Future work

pds has a number of places that can be improved upon, here is a short list of tentative plans:

  • Automatically rebuild binaries when pds.conf has been changed. It's not a huge problem as configuration and code often change together but it can also cause subtle and annoying bugs.
  • Expose ocamlc and ocamlopt compiler options. The extra_compiler_opts option is for both and this can be problematic. For example, specifying compiler options for profiling requires that ocamlc has one set of options and ocamlopt has another. The ability to specify these independently will make more complicated builds expressible inside pds.
  • Make pds drive more of the build process. Currently pds just generates Makefiles but it would be nice if it could then run them as well. For my projects now, I have a Makefile that executes pds for me but simplifying a layer could make the whole thing smoother. This would work well with automatically rebuilding binaries on configuration changes.
  • Remove the Makefile requirement altogether. This is not to say that there will be no Makefiles, just that they should not be exposed to the user as they are now and could be replaced with something else, such as Bazel, or target Ocamlbuild or omake. This could take a few forms, such as certain types of builds are guaranteed to be portable or that pds grows to express more complicated builds. I am leaning towards the former but I'm not sure what that would actually look like.
  • Rethink the configuration variable names. Many of them exist in their current forms, such as extra_compiler_opts due to legacy. At some point these should be remade to be clearer and possible adjust the meaning of some of them.

Conclusion

pds is a system that was created to solve an itch myself and a few colleagues had. If it solves a problem you have then I hope you use it and please open issues, offer suggestions, and PRs are appreciated. If you're happy with the existing solutions (as I write this, Jane St has released jbuilder) then I hope pds doesn't get in your way.

To read more about pds, look at the README found in the source: https://bitbucket.org/mimirops/pds/src

Generating OPAM Packages With hll

Once a project is ready to release, the next step is to package it. In Ocaml, the standard package manager is OPAM. hll takes a pds.conf and an hll.conf and generates an OPAM package.

To get hll, install it via OPAM with: opam install hll

An hll configuration specifies where the code can be found, a description of the package, authors, extra dependencies, and other OPAM configuration values. Executing hll generate will read the pds.conf and hll.conf and download and checksum the source if it is an HTTP source and then generate the package configuration. By default, hll takes the current tag as the package version to generate but that can be overridden.

A package generated in hll is guaranteed to pass opam-lint successfully (as it is defined at the time of the release). That means if opam-lint adds more requirements, those requirements will be reflected in hll.

To create a package from repository, run hll generate --opam-dir /path/to/opam/repo in the base directory of the repo. To see all of the options, run hll generate --help. The hll command uses the generate sub-command because it will probably get more operations in the future.

An example

The hll.conf for pds looks like below, line numbers have been added for explanation:

 1: pds = { major_version = 5 }
 2: 
 3: desc = "A tool to build Makefiles for Ocaml projects"
 4: maintainer = "orbitz@gmail.com"
 5: authors = [ "orbitz@gmail.com", "dklee@dklee.org" ]
 6: homepage = "https://bitbucket.org/mimirops/pds"
 7: bug_reports = "https://bitbucket.org/mimirops/pds/issues"
 8: dev_repo = "git@bitbucket.org:mimirops/pds.git"
 9: 
10: build_deps = [ "crunch" ]
11: available = "ocaml-version >= \"4.02\""
12: 
13: url_template = "https://bitbucket.org/mimirops/pds/get/{tag}.tar.gz"
14: url_pattern = "{tag}"
15: url_protocol = "http"
16: deps_blacklist = ["pds"]
  • Line 1 specifies which major version of pds this build must use. In this case this is here because hll requires it even though it doesn't make much sense because this is the pds package itself. But for all non-pds packages, pds will get added as a dependency automatically and hll needs to know which version of pds the build will work with. This makes it easier to do backwards breaking changes in pds.
  • Line 3 is the description of the package.
  • Line 4 is the maintainer of the package.
  • Line 5 is the author of the code.
  • Line 6 is the homepage for the project.
  • Line 7 is where bugs can be reported.
  • Line 8 is where development for the source code happens.
  • Line 10 is a way to add more OPAM dependencies to the package. In the case of pds, it uses crunch, which is a program for turning a directory structure into Ocaml modules. Because it's a program, it's not a dependency listed in pds.conf, which is only compilation dependencies.
  • Line 11 is an extra package configuration, this adds that this package is only available for ocaml versions greater than or equal to 4.02.
  • Line 13 is the template for the URL the source code is downloaded from. This is a template because the URL depends on which version of the package is being generated.
  • Line 14 is the pattern in the URL on line 13 which is replaced with the version.
  • Line 15 is the protocol in the url_template. In the case of http, the protocol could have been inferred, but that is not the case for all protocol types, so it is explicit in the configuration.
  • Line 16 is a list of dependencies that will be found in pds.conf but should not make it to the package. In this case, the pds package is automatically added to every package in hll but since this hll.conf is for the pds program I do not want it to be a dependency for itself.

Dependency maps

hll supports a few more configuration options but the most useful that has not been shown is the deps_map section. Because a pds.conf only describes a build and dependencies are ocamlfind names, those dependencies might not be the same as the corresponding OPAM package names. On top of that, something that is multiple compile-time dependencies might be a single OPAM dependency. The deps_map section solves this by mapping the name of an OPAM package to the dependencies in pds.conf.

The following snippet is a piece of a hll.conf, not the whole thing:

[deps_map]
ctypes-foreign = ["ctypes.foreign"]

This states that the OPAM package should depend on a package called ctypes-foreign and that ctypes-foreign consumes the ctypes.foreign dependency, which is in the pds.conf. The value is a list because a single OPAM package may provide multiple ocamlfind packages.

Pinning

hll supports specifying versions for dependencies through an external document, this is called pinning. This file contains one dependency per line where the first column is the name of the dependency and the second is the OPAM constraints to apply to it. The columns are separated by a single space. I do not actually use pinnings that much, I prefer to pin things directly in OPAM rather than at the package level but that is not possible for public packages.

Conclusion

hll is a tool that leverages pds in order to help automate the whole process of packaging Ocaml software. hll makes it easy to create a package for testing or to have internal and external packages through internal and external hll.conf files. For most packages, an hll.conf can completely describe what the package should look like and a user does not even need to work with a OPAM package file. I especially find it convenient to be able to generate local OPAM repositories just by running hll generate on the repos that I am working with at that moment. It should also make it easier to turn Ocaml projects into other package formats automatically assuming there is a mapping from OPAM packages to the other package system.

To read more about hll, look at the README found in the source: https://bitbucket.org/mimirops/hll/src/

The Simple Programmer

Part of the reason that pds and hll were created is to experiment with the belief that builds, packaging, and dependency management should be separate. Builds systems such as Maven and rebar conflate these operations into one tool. It is the belief of the author that combining this functionality into one tool costs more than the value it offers.

The main argument for combining all of this functionality is that it produces a streamlined experience for the user. The system can know when to rebuild targets, when to download them, and from where to download them. The approach that Maven and rebar take is a single tool which understands the input files and provides hooks to perform other operations. Because the possible operations are infinite, the interface for these tools is either quite large, such as in Maven, or small but incomplete, such as in rebar (at least in rebar2, I have not used rebar3). One is also limited to writing plugins in the language these tools are implemented in rather than the one with which the user feels more comfortable. The input files are generally not designed to be consumed by other tools either as the file schemas are poorly defined or the file formats are not easy to parse. In the case of Maven, this is an XML file which requires complex parsing and in the case of rebar the configuration is Erlang terms.

The approach that pds has taken is to define the file format and schema as the public interface. With that, tools can be written around the input files to perform operations above and beyond the existing tooling. hll, itself, is an example of this. hll was not part of the original idea when implementing pds but once pds was building projects we made some packages for them by hand and realized that it would be nicer if that were automated. We added hll.conf with the metadata needed to generate a package and the hll command reads that and the pds.conf whose schema it can depend on. Similarly, other tools can depend on the schema of hll.conf. With this we've avoided having to add complexities to pds, such as a plugin system, in order to support new use-cases. pds can focus on just being good at building software and hll on generating OPAM packages. An additional benefit is that adding new functionality, such as generating packages for ones favorite package manager doesn't require reading through an existing code base and understanding how building works or how OPAM package generation works. Both of which are likely irrelevant to making the new feature.

In my opinion, this experiment has been a success. pds and hll have fulfilled my needs and I've converted all of my existing projects over to them without issue. While it's hard to say, as I understand these tools inside and out, I believe them to be simpler to use. The line count (just using wc -l) of these tools combined is 1108 lines, making the worst case of having to read the code to understand them quite manageable. For comparison, the line count of all the Java source files in Maven is 128,416 lines. The comparison is not entirely fair as that includes tests and Maven does more than pds and hll but even so, that is two orders of magnitude more lines of code.

Like any pattern, it's not applicable everywhere and it's up to the judgment of the user as to where to use it. But when making a new piece of code, consider if maybe more of it should be pushed down into configuration that can be shared.

Updated: 2016-12-19 Mon 21:05
Up