For nearly as long as I have been using NixOS, krops has been my deployment tool of choice. It is not as well-known as alternatives like NixOps or Colmena, but has served me well for long time. However, it does not yet support flakes natively and also lacks some other features like parallel deployments and the flexibility to run individual deployment steps.
I recently stumbled upon go-task and after playing a bit with it, I immediately had the idea of writing a deployment tool with it to address everything I was missing in krops. Lollypops (Lollypop Operations) has been in use successfully on all my machines since then and is still evolving.
Why another deployment tool?
There are already numerous deployment tools for NixOS out there, so the question arises why anyone would want to write another one. In short, I have tried most of them and none of them provided the feature set I wanted or had other limitations. Let met go over my deployment tool journey.
NixOps
The first deployment tool anyone will encounter when using NixOS beyond the
simple nixos-rebuild switch
is probably
NixOps. It is more "official" than other
choices and well-tested. However, while it sure has its place in other
scenarios it is not for me. First up, it is not stateless. It keeps a SQLite
database with the status of all machines. This limit deployment to a single
central machine. Sure, I could sync the state over multiple machines, but that
is an unnecessary complication in my opinion. Also, at least at the time of me
using it, the secrets deployment was very rudimentary and only supported
clear text secrets which is not optimal and prevents me from pushing my NixOS
configurations to a public repository. There are workarounds for
this and NixOps can
for sure be extended, but together with its stateful nature I don't see the
point of going this rout for my use-case.
Colmena
Colmena does a lot of things right. In fact, it seems to be a very nice piece of software and include most of the features I would like to see in a modern NixOS deployment tool. However, there are also some limitations that keep me from switching to it. First off, I don't really like the choice of configuration format for deployments.
Instead of putting the deployment options in the configuration of the host itself, Colmena requires a separate flake output
{
colmena = {
meta = {
nixpkgs = import nixpkgs {
system = "x86_64-linux";
};
};
host-a = { name, nodes, pkgs, ... }: {
boot.isContainer = true;
time.timeZone = nodes.host-b.config.time.timeZone;
};
host-b = {
deployment = {
targetHost = "somehost.tld";
targetPort = 1234;
targetUser = "luser";
};
boot.isContainer = true;
time.timeZone = "America/Los_Angeles";
};
};
}
I find this non-standarized flake output suboptimal. For example having all
deployment options for all hosts in a central output makes it complicated to
move a host to a different flake. A more elegant solution in my opinion would be
to make the deployment and secrets management part of the host itself (part of
nixosConfigurations.<hostname>
).
The second and probably more deciding factor for me is complexity. This might be highly subjective, but just like NixOps Colmena consists of a substantial amount of (in this case Rust) code, that does a lot behind the scenes. It might not be a problem in itself if you are willing to read it all or don't run into cases where you have to debug much, but it is complexity that is not needed to perform the tasks we are trying to accomplish in my opinion. The configurations of my systems and infrastructure is something I want to be able to understand to its last detail and this seems like an unnecessary source of complexity which besides debugging makes building on it and extending problematic. This aspect is on that has led me to the last tool compared here, krops, which I was using before.
Krops
What is actually needed to deploy a NixOS system? In general, you will in most cases:
- Decrypt/fetch and copy secrets to the target
- Run
nixos-rebuild switch
- Optional, other steps like copying over the configuration
If you are thinking, you could do this with a simple bash script for each host, you are right. Krops is a tool that will generate that script for you from a configuration file written in nix.
This means in practice, that you configure your systems and get a nix file with multiple outputs, each one corresponding to a host or a group of hosts. To deploy you then run
# Generatee deployment script
# Deploy to the host
This works very well and has minimal overhead. If something goes wrong you can
always just look into the generated ./result
which is just a shell script and
execute the commands in there manually to find what is causing the problem.
Also, the overhead is minimal. In fact, it works so well that this is what I
have been using for multiple machines for quite some time now.
However, there are some drawbacks: For once it is again a separate configuration for the deployment. Even more so than with Colmena, because it even is a separate file. Krops was not build with flakes in mind and while it surely works with it, it would be much nicer to have it as a flake output. This is not possible without forking it, as there are non-pure statements in the code which are not allowed in a flake build.
A more serious drawback is that it is not parallel at all. You can find hacky workarounds to execute multiple copies of it at once, but in general every host has to be build after each other in sequential order. That is only acceptable if you have a few hosts or a lot of patience. Lastly, the secrets management is build to pass. I like that password manager, but I would prefer to be able to use any secrets manager I like and plug it into the deployment tool. The secrets management itself is quite rudimentary, they get just copied with root-only permissions to a specified directory, from there it's your task to set the correct permissions or do with them whatever you want.
Lastly, of course there is a certain element of interest in writing this for the sake of learning how to do it.
Requirements
Let's recap what I am looking for and specify some requirements. This list is not exhaustive, but contains the key points I want to address with existing solutions and what is crucial to be useable for me.
- Stateless: More specifically storing data on the machine that initializes
the deployment (I'll call it the
local
host) is undesirable. The state should be on the host where we deploy to (theremote
host). - Parallel: There should be the option to deploy to multiple hosts in parallel or do various steps on one host in parallel. Of course, we have to think about dependencies when running tasks asynchronously too.
- Pluggable secrets backend: I've been using pass for now, but would like the option to use any backend or manager that has a CLI interface or other kind of API. Some interesting options would be Hashicorp Vault for larger setups and Bitwarden via it's CLI tools for personal stuff.
- Flakes-first configuration: Flakes are here to stay. The tools should
support them as first-class citizen and make use of their outputs. This
includes
.apps
for running the tool. - Deployment options in the host's configuration: The deployment and specification of secrets are part of the configuration of the host and should therefore be in it's NixOS configuration
- Thin and simple: For a lack of better title, I want as much code to be Nix code itself and the tool to be easily understandable. Looking at the source should explain its function for anyone who has basic understanding of Nix and NixOS.
- Debuggable: Not only should it be simple to understand what is happening,
but I want good debugging capabilities. This includes showing all the commands
that are being run, the ability to control output and the ability to run
individual steps separately. Also, it should always be possible to still run
nixos-rebuild
on the remote host itself in case we can't deploy from another host or need remote and local to be the same machine. - Extensible: It makes only sense to think about extensibility. While all steps should have sensible defaults, I'd like to make them configurable. It should be possible to run additional commands before, during and after the deployment. This could be useful to do things like hardware provisioning with terraform or other tools.
- Performant, Secure, Maintainable: The usual stuff, should be self-explanatory.
About go-task/task
https://taskfile.dev/ Task is a task runner / build tool that aims to be simpler and easier to use than, for example, GNU Make. Since it's written in Go, Task is just a single binary and has no other dependencies, which means you don't need to mess with any complicated install setups just to use a build tool.
Similar to a Makefile
the steps to be run are specified in a file, in this
case yaml. The syntax is quite self-explanatory
if you have used any form of yaml-based provisioning or CI. A simple Taskfile
can look like this:
version: '3'
tasks:
build:
cmds:
- go build -v -i main.go
assets:
cmds:
- minify -o public/style.css src/css
There is also support for templating and more complex syntax structures, but we'll be using Nix for most of the configuration.
Basic Concept
While the YAML format has its limitations (e.g. The Norway Problem) and there are more expressive formats out there, it has undoubtedly become an industry standard. So much in fact, certain professions are starting to be jokingly called "Yaml Engineers".
For NixOS this has been a blessing, even though it was never a consideration. YAML is (99.99% of the time) a strict superset of JSON. Yes, JSON is valid YAML.
Mix this with the built-in Nix
function
toJSON
and you have a pretty universal and effective way to generate
configuration from Nix code for any application that understands YAML.
Lollypops allows deploying two things: The system configuration and secrets. I
will use this to make both part of NixOS hosts configuration and generate a
Taskfile.yml
from it dynamically, which go-task can execute.
Flakes first
Since I only use flake-based setups, Lollypops assumes this new style of NixOS
configuration. In contrast to Colmena and others, I won't create an additional
flake output though, it is all part of the nixosConfigurtions
.
nixosConfigurations = {
host1 = nixpkgs.lib.nixosSystem {
system = "x86_64-linux";
modules = [
lollypops.nixosModules.lollypops # Import the lollypops module
./configuration.nix # Other configuration
{
lollypops = { # Here goes our config for the host
deployment = { /* Deployment configuration */ };
secrets = { /* Secrets configuration */ };
};
}
];
};
};
From here, it is "just" a matter of generating a Taskfile.yaml
from the
options set and providing a flake app
that can run it.
Implementation
Let's start with what the user will face: The final executable. This is for now astonishingly simple. In fact this is all that the app output does, it runs go-task from nixpkgs with a specified taskfile passed as first parameter and any following arguments from the user after that.
{
drv = pkgs.writeShellScriptBin "go-task-runner" ''
pkgs.go-task/bin/task -t taskfile "$@"
'';
};
The taskfile is of course where most of the magic happens. In total, we want to
generate a Taskfile.yml
with the following tasks:
For each host:
- Copy the configuration (flake) to it
- Copy the secrets to it
- Run the rebuild for the host
In fact, this is exactly what Lollypops generates. This is an excerpt for two
hosts called porree
and ahorn
, shortened for clarity:
Apart from the steps outlined above, it makes use of some of Go-Task's features:
By using dependencies and the include
statement, most of the steps can be
reused with parameters. In addition to the two hosts, there is a special all
tasks, which, you guessed it, runs all of them.
But what is in that included CommonTasks.yml
file? This is where the actual
script is generated. It consists mainly of 4 sections:
- Set environment variables used as parameters
- Task with generated script to copy over flake
- Task with generated script to copy over secrets
- Task with
nixos-rebuild
command to rebuild
The variables are used to map the options created by the NixOS module to environment variables in the syntax of Go-Task. For example this the modules option
host = mkOption {
type = types.str;
default = "config.networking.hostName";
description = "Host to deploy to";
defaultText = "<config.networking.hostName>";
};
Is used to populate the environment variable HOSTNAME
which will be used in
various steps of the task as {{.HOSTNAME}}
.
Copy flake
Copying over the flake is not strictly necessary, as we could just use
nixos-rebuild
with --target-host
and be done with it, but I like to have a
copy of the flake itself on the machines as it allows manual rebuilding and
debugging should you happen to misconfigure the internet connection or for some
other reason need to work on the machine locally.
This is the script it generates. You can see the use of the set variable in Go-Task's templating format
source_path=
if ; then
source_path=
/fi
Rebuild
The rebuild itself is the simplest part. It is possible to use nixos-rebuild
with the --target-host
option for remote provisioning, but I want to build on
the host itself in most cases. This is configurable, but the default is just a
simple nixos-rebuild
command executed via SSH.
Secrets management
The deployment of secrets calls a configurable password command (pass
used
here) and pipes the output via SSH directly to the file where it is supposed to
be written. The file is initially created with permissions so that only root can
read/write it. If other permissions are configured they are applied in a last
step.
The process is repeated for every secret that is found to be configured in the
lollops.secrets.files
option.
;
# Create directory if it does not exist
# Run pass command and pipe output
|
# Set permissions
It would of course possible to simplify the script, e.g. by writing to a local temporary file and copying it over. I chose this approach in order to avoid having either a local file with the secret or a remote file with the secret with wrong permissions at any point in time. Even during rebuild there should be no possibility to steal it by other users on the local or the remote side.
User interface and interaction
First things first: Since all we are doing under the hood is use Go-Task's CLI with a generated
Taskfile.yml
, we get the benefit of having all the options it provides "for
free". When executing, all arguments on the command line after --
will be
passed to Go-Task instead of being treated as nix parameters.
)
)
The most interesting ones for us are probably the concurrency and dry run options, which allow provisioning multiple hosts at once in parallel or dry-running for debug runs.
Let see what tasks we have available. Using --list-all
with a flake with two
nixosConigurations
(ahorn
and porree
) I get the following options:
Usage examples
The tasks are organized in to a tree-like structure, which makes it easy to reason about them. Running a task will also run all of its dependencies, which allows provision all hosts at once, a single host, a single task or any combination for those.
all
├── ahorn
│ ├── check-vars
│ ├── deploy-flake
│ ├── deploy-secrets
│ └── rebeild
└── porree
├── check-vars
├── deploy-flake
├── deploy-secrets
└── rebuild
A few examples. Deploy everything:
Execute two tasks of a host, in this case copying the flake over and rebuilding, without re-provisioning the secrets
Provision two hosts in parallel
You get the idea.
Possible extensions and more
I've been using the tool myself daily for a few months now, and it does everything I need currently. Still, I have a few ideas for features I'd like to add at some point.
One of the strengths is endless extensibility, for example it would be possible
to add more steps as pre-rebuild
and post-rebuild
commands, that can be
configured to do something like provisioning VPS with terraform.
For a more complex example, have a look at my configuration
files and browse grep
for lollypops
to
find many examples of how to use it, including secrets deployment. And of
course, browse through the code of
Lollypops. Additional options and
documentation is over there.