Ocaml: Building and Packaging
Table of Contents
Building Ocaml Projects With pds
pds is a simple build system designed for small projects. It targets Unix-like
OS's at the moment and turns a TOML file into a set of Makefiles. This post is
written for pds major version 5.
To get pds, install it via OPAM with: opam install pds
Why does pds exist when there are other solutions like omake and Ocamlbuild? The belief of the author is that these tools are fine but a bit too complicated or verbose for many basic projects. pds is less expressive than existing tools in the hopes that it makes the process simpler and easier. pds is meant to play well with other tools, so if someone does not like pds or has a complicated build that is part pds and part something else, it shouldn't get in the way.
There are a few reasons for Makefiles as the output. First, it simplifies pds considerably. pds only needs to know how to generate a Makefile for Ocaml builds rather than be a generic build system. Secondly, it allows users to extend a build by hooking in their own Makefiles to do the extra work. Finally, something that can execute a Makefile exists on almost all platforms, making them a fairly portable solution. The Future work section proposes some changes that could abstract more of the Makefiles way making pds more portable.
A simple pds project
A project that uses pds has two things:
- The code in a particular directory layout.
- A
pds.conffile that describes the build.
When the pds program is run, it will read the pds.conf and produce several
Makefiles. The entry point to these Makefiles is pds.mk, which supports a
number of targets, including all, which will build everything, test and
install.
Here's an example. First, create the directory structure and source files.
- Create a directory called
pds_exampleand go into it. - Create directories called
src/exampleandtests/test_example. - In
src/examplecreate a file calledexample.mland it putlet foo = 4. - In
tests/test_examplecreate a file calledtest.mland in it putlet () = assert (Example.foo = 4).
Now with this, make pds.conf in the root directory of the project:
[src.example] install = false [tests.test_example] deps = ["example"]
This pds.conf says that there is a source project called example and to not
install it when the install target is run. There is a collection of tests
called test_example and they depend on the example source project.
With that, the code can be built and tested:
- Run
pdsto generate the Makefiles. - Run
make -f pds.mkto build everything (on *BSD usegmake). - Run
make -f pds.mk testto run the tests. Running thetesttarget will also build, or rebuild if necessary, anything the test depends on.
That's it for a basic setup. A few notes:
- The
pdsprogram needs to be run before every build so it can regenerate new Makefiles based on if new source files were added orpds.confwas updated. - Source projects build a library or executable based on all of the source files in the directory.
- For Ocaml projects, both native and byte code output will always be generated and tested.
- For tests, each individual file is its own test, so
tests/test_example/could have had multiple.mlfiles in it and each one would be an individual test. Running thetesttarget will compile and execute each one of those. For performance reasons it might be desirable to have multiple tests in a single executable. How to accomplish that is entirely up to which test harness you use. I generally have one test file per aspect of the thing I'm testing with several tests in that file. - A test is considered failed if the executable has a non-zero exit code.
- The
pdsprogram should be run before every build operation, it will regenerate all of the Makefiles. - If you have changed
pds.confyou might have to domake -f pds.mk clean. Currently pds does not track if a change topds.confrequires a rebuild. Adding a new project to the configuration does not require a rebuild because it will just build the new sources.
All output is placed into the build directory which has sub-directories for
the type of build the files correspond to such as build/release or
build/debug.
Some more advanced things
Often there are multiple kinds of builds one wants, such as release, debug, and
profile. pds supports these, although it does not apply any meaning to them,
they are just build names that are available for configuration. The release
build is the default build for everything and must exist. The other builds
will have the same configuration as the release build unless overridden. One
can override options at a global level as well as in an individual project
configuration.
For example, in order to debug the example project it needs to have some debug
compile options specified. To do this, add a debug section to the pds.conf,
the full file will now look like:
[src.example] install = false debug = { extra_compiler_opts = "-g" } [tests.test_example] deps = ["example"]
One could also have made a separate section in the TOML called
[src.example.debug]. I prefer to do it with debug = ... because it's easy
to make a typo in the separate section way, which will not be detected by pds.
Now execute the debug build: make -f pds.mk debug
pds will handle a release and debug build coexisting just fine, it compiles
these into separate build directories. As mentioned before, in the current
released version version, 5.15, it will not handle modifying pds.conf and
rebuilding properly. One will need to perform a clean and then build again to
make sure everything is built with the new options.
A debug test can also be run, which will run the tests but using the debug
versions of the projects. To do this run: make -f pds.mk test-debug. The
same applies for profile builds as well.
Sometimes one wants default build options and for that there is a global
section that this can be put in:
[global.debug] extra_compiler_opts = "-g" [src.example] install = false [tests.test_example] deps = ["example"]
The precedence rules are that the configuration in a project always wins
followed by the global section and finally by defaulting to the release
build options.
A common pattern I find myself doing with globals is something like:
[global.release] extra_compiler_opts = "-safe-string -no-assert" [global.debug] extra_compiler_opts = "-g -safe-string" [src.example] install = false [tests.test_example] deps = ["example"]
Of course, projects often have more configuration than just if they should be
installed or not. The most common is dependencies, which are specified as a
list of names which can be found through ocamlfind. For example, if the
example project depended on regular expression package, re, to build, it
would look like:
[src.example] install = false deps = ["re"] [tests.test_example] deps = ["example"]
Again, be sure to run pds first, then make -f pds.mk clean then make -f
pds.mk to build. This is only necessary if you didn't change a source code
file as well. If a source code file has been changed to now use the regular
expression module and the pds.conf was updated at the same time, the source
will recompile just fine.
The deps option can also refer to internal packages, like in the case of
tests.test_example, pds will determine which deps are internal and
external and create the correct build order.
Formatted pds.conf
A final feature that pds has is the -f option, which stands for "formatted".
Running pds -f in a project will output a simplified view of the pds.conf.
The output is tab separated with the following columns:
| Column Index | Description |
|---|---|
| 1 | Type of build (src or test) |
| 2 | Name of project |
| 3 | Build type (library, exec, empty for third-party) |
| 4 | Project type (ocaml, third-party) |
| 5 | Comma separated list of dependencies |
This feature is useful for simple interpretation of a pds configuration. An
example of this is merlin-of-pds which is a simple awk script to take the
formatted output and create a .merlin file.
Some other things
- pds supports documentation builds. For Ocaml project it, by default, only
looks at
*.mlifiles but this can be modified. In order to build the documentation, runmake -f pds.mk docs. - My projects have a Makefile in them that looks like below, it executes pds on
every build. With this Makefile I can just do
make docsormake test. It will also generate the.merlinfile on every run:
.PHONY: .merlin all: .merlin pds $(MAKE) -f pds.mk all %: .merlin pds $(MAKE) -f pds.mk $* .merlin: pds -f | merlin-of-pds > .merlin
- It's safe to execute pds builds in parallel with the
-joption.
Future work
pds has a number of places that can be improved upon, here is a short list of tentative plans:
- Automatically rebuild binaries when
pds.confhas been changed. It's not a huge problem as configuration and code often change together but it can also cause subtle and annoying bugs. - Expose ocamlc and ocamlopt compiler options. The
extra_compiler_optsoption is for both and this can be problematic. For example, specifying compiler options for profiling requires thatocamlchas one set of options andocamlopthas another. The ability to specify these independently will make more complicated builds expressible inside pds. - Make pds drive more of the build process. Currently pds just generates Makefiles but it would be nice if it could then run them as well. For my projects now, I have a Makefile that executes pds for me but simplifying a layer could make the whole thing smoother. This would work well with automatically rebuilding binaries on configuration changes.
- Remove the Makefile requirement altogether. This is not to say that there will be no Makefiles, just that they should not be exposed to the user as they are now and could be replaced with something else, such as Bazel, or target Ocamlbuild or omake. This could take a few forms, such as certain types of builds are guaranteed to be portable or that pds grows to express more complicated builds. I am leaning towards the former but I'm not sure what that would actually look like.
- Rethink the configuration variable names. Many of them exist in their current
forms, such as
extra_compiler_optsdue to legacy. At some point these should be remade to be clearer and possible adjust the meaning of some of them.
Conclusion
pds is a system that was created to solve an itch myself and a few colleagues had. If it solves a problem you have then I hope you use it and please open issues, offer suggestions, and PRs are appreciated. If you're happy with the existing solutions (as I write this, Jane St has released jbuilder) then I hope pds doesn't get in your way.
To read more about pds, look at the README found in the source: https://bitbucket.org/mimirops/pds/src
Generating OPAM Packages With hll
Once a project is ready to release, the next step is to package it. In Ocaml,
the standard package manager is OPAM. hll takes a pds.conf and an hll.conf
and generates an OPAM package.
To get hll, install it via OPAM with: opam install hll
An hll configuration specifies where the code can be found, a description of the
package, authors, extra dependencies, and other OPAM configuration values.
Executing hll generate will read the pds.conf and hll.conf and download
and checksum the source if it is an HTTP source and then generate the package
configuration. By default, hll takes the current tag as the package version to
generate but that can be overridden.
A package generated in hll is guaranteed to pass opam-lint successfully (as it is defined at the time of the release). That means if opam-lint adds more requirements, those requirements will be reflected in hll.
To create a package from repository, run hll generate --opam-dir
/path/to/opam/repo in the base directory of the repo. To see all of the
options, run hll generate --help. The hll command uses the generate
sub-command because it will probably get more operations in the future.
An example
The hll.conf for pds looks like below, line numbers have been added for
explanation:
1: pds = { major_version = 5 } 2: 3: desc = "A tool to build Makefiles for Ocaml projects" 4: maintainer = "orbitz@gmail.com" 5: authors = [ "orbitz@gmail.com", "dklee@dklee.org" ] 6: homepage = "https://bitbucket.org/mimirops/pds" 7: bug_reports = "https://bitbucket.org/mimirops/pds/issues" 8: dev_repo = "git@bitbucket.org:mimirops/pds.git" 9: 10: build_deps = [ "crunch" ] 11: available = "ocaml-version >= \"4.02\"" 12: 13: url_template = "https://bitbucket.org/mimirops/pds/get/{tag}.tar.gz" 14: url_pattern = "{tag}" 15: url_protocol = "http" 16: deps_blacklist = ["pds"]
- Line 1 specifies which major version of pds this build must use. In this case this is here because hll requires it even though it doesn't make much sense because this is the pds package itself. But for all non-pds packages, pds will get added as a dependency automatically and hll needs to know which version of pds the build will work with. This makes it easier to do backwards breaking changes in pds.
- Line 3 is the description of the package.
- Line 4 is the maintainer of the package.
- Line 5 is the author of the code.
- Line 6 is the homepage for the project.
- Line 7 is where bugs can be reported.
- Line 8 is where development for the source code happens.
- Line 10 is a way to add more OPAM dependencies to the package. In the case of
pds, it uses crunch, which is a program for turning a directory structure into
Ocaml modules. Because it's a program, it's not a dependency listed in
pds.conf, which is only compilation dependencies. - Line 11 is an extra package configuration, this adds that this package is only
available for ocaml versions greater than or equal to
4.02. - Line 13 is the template for the URL the source code is downloaded from. This is a template because the URL depends on which version of the package is being generated.
- Line 14 is the pattern in the URL on line 13 which is replaced with the version.
- Line 15 is the protocol in the
url_template. In the case of http, the protocol could have been inferred, but that is not the case for all protocol types, so it is explicit in the configuration. - Line 16 is a list of dependencies that will be found in
pds.confbut should not make it to the package. In this case, the pds package is automatically added to every package in hll but since thishll.confis for the pds program I do not want it to be a dependency for itself.
Dependency maps
hll supports a few more configuration options but the most useful that has not
been shown is the deps_map section. Because a pds.conf only describes a
build and dependencies are ocamlfind names, those dependencies might not be the
same as the corresponding OPAM package names. On top of that, something that is
multiple compile-time dependencies might be a single OPAM dependency. The
deps_map section solves this by mapping the name of an OPAM package to the
dependencies in pds.conf.
The following snippet is a piece of a hll.conf, not the whole thing:
[deps_map] ctypes-foreign = ["ctypes.foreign"]
This states that the OPAM package should depend on a package called
ctypes-foreign and that ctypes-foreign consumes the ctypes.foreign
dependency, which is in the pds.conf. The value is a list because a single
OPAM package may provide multiple ocamlfind packages.
Pinning
hll supports specifying versions for dependencies through an external document, this is called pinning. This file contains one dependency per line where the first column is the name of the dependency and the second is the OPAM constraints to apply to it. The columns are separated by a single space. I do not actually use pinnings that much, I prefer to pin things directly in OPAM rather than at the package level but that is not possible for public packages.
Conclusion
hll is a tool that leverages pds in order to help automate the whole process of
packaging Ocaml software. hll makes it easy to create a package for testing or
to have internal and external packages through internal and external hll.conf
files. For most packages, an hll.conf can completely describe what the
package should look like and a user does not even need to work with a OPAM
package file. I especially find it convenient to be able to generate local OPAM
repositories just by running hll generate on the repos that I am working with
at that moment. It should also make it easier to turn Ocaml projects into other
package formats automatically assuming there is a mapping from OPAM packages to
the other package system.
To read more about hll, look at the README found in the source: https://bitbucket.org/mimirops/hll/src/
The Simple Programmer
Part of the reason that pds and hll were created is to experiment with the belief that builds, packaging, and dependency management should be separate. Builds systems such as Maven and rebar conflate these operations into one tool. It is the belief of the author that combining this functionality into one tool costs more than the value it offers.
The main argument for combining all of this functionality is that it produces a streamlined experience for the user. The system can know when to rebuild targets, when to download them, and from where to download them. The approach that Maven and rebar take is a single tool which understands the input files and provides hooks to perform other operations. Because the possible operations are infinite, the interface for these tools is either quite large, such as in Maven, or small but incomplete, such as in rebar (at least in rebar2, I have not used rebar3). One is also limited to writing plugins in the language these tools are implemented in rather than the one with which the user feels more comfortable. The input files are generally not designed to be consumed by other tools either as the file schemas are poorly defined or the file formats are not easy to parse. In the case of Maven, this is an XML file which requires complex parsing and in the case of rebar the configuration is Erlang terms.
The approach that pds has taken is to define the file format and schema as the
public interface. With that, tools can be written around the input files to
perform operations above and beyond the existing tooling. hll, itself, is an
example of this. hll was not part of the original idea when implementing pds
but once pds was building projects we made some packages for them by hand and
realized that it would be nicer if that were automated. We added hll.conf
with the metadata needed to generate a package and the hll command reads that
and the pds.conf whose schema it can depend on. Similarly, other tools can
depend on the schema of hll.conf. With this we've avoided having to add
complexities to pds, such as a plugin system, in order to support new use-cases.
pds can focus on just being good at building software and hll on generating OPAM
packages. An additional benefit is that adding new functionality, such as
generating packages for ones favorite package manager doesn't require reading
through an existing code base and understanding how building works or how OPAM
package generation works. Both of which are likely irrelevant to making the new
feature.
In my opinion, this experiment has been a success. pds and hll have fulfilled
my needs and I've converted all of my existing projects over to them without
issue. While it's hard to say, as I understand these tools inside and out, I
believe them to be simpler to use. The line count (just using wc -l) of these
tools combined is 1108 lines, making the worst case of having to read the code
to understand them quite manageable. For comparison, the line count of all the
Java source files in Maven is 128,416 lines. The comparison is not entirely
fair as that includes tests and Maven does more than pds and hll but even so,
that is two orders of magnitude more lines of code.
Like any pattern, it's not applicable everywhere and it's up to the judgment of the user as to where to use it. But when making a new piece of code, consider if maybe more of it should be pushed down into configuration that can be shared.
