Beaver is a minimal build system geared towards scientific programming and reproducibility. It uses the python programming language to express how transforms generate outputs from inputs. If you’re familiar with python, using Beaver couldn’t be easier, as we will demonstrate by example.
# A simple example (saved as `beaver.py`) to generate `output.txt` with content `hello`.
import beaver_build as bb
transform = bb.Shell(outputs="output.txt", inputs=None, cmd="echo hello > output.txt")
Executing Beaver from the command line generates the desired output.
$ beaver output.txt
🦫 INFO: 🟡 artifacts [output.txt] are stale; schedule transform
🦫 INFO: ⚙️ execute shell command `echo hello > output.txt`
🦫 INFO: ✅ generated artifacts [output.txt]
$ cat output.txt
This seems like a convoluted way to write
output.txt. So what’s going on? The statement
bb.Shell(...) defines a
Transform that generates the
output.txt by executing the shell command
echo hello > output.txt. Executing
beaver output.txt asks Beaver to generate the artifact–which it gladly does.
Why should we care? Transforms can be chained by using the outputs of one as the inputs for another. Beaver ensures that all transforms are executed in the correct order and parallelizes steps where possible. These are of course the tasks of any build system, but Beaver’s unique selling points are (see Why not use …? for further details):
users do not need to learn a domain-specific language but use flexible python syntax to create and chain transforms.
scheduling of operations is delegated to python’s
asynciopackage which both minimizes the potential for bugs (compared with a custom implementation) and simplifies parallelization.
Other features include:
incremental builds based on artifact digests–whether artifacts are files or not.
Why not use …?
ant uses relatively verbose XML syntax and limited in its flexibility, e.g. transforms cannot be easily generated on the fly.
bazel focuses on speed and correctness–which it does extremely well. Bazel achieves these goals by “[taking] some power out of the hands of engineers”. This is a good compromise for production systems, but, for scientific applications, we want to retain a high degree of flexibility.
maven is primarily Java focused and relies on conventions to generate artifacts. Well-established conventions are essential for software development, especially in large teams, but are often lacking in the context of investigating a new scientific problem.
pydoit uses standard python syntax to collect task metadata akin to test discovery in pytest. However,
dodo.pyfiles are sometimes difficult to read because the code does not directly express the tasks to execute.
snakemake uses a non-standard python syntax, steepening the learning curve.