Pablo Ovelleiro Corral

For nearly as long as I have been using NixOS, krops has been my deployment tool of choice. It is not as well-known as alternatives like NixOps or Colmena, but has served me well for long time. However, it does not yet support flakes natively and also lacks some other features like parallel deployments and the flexibility to run individual deployment steps.

I recently stumbled upon go-task and after playing a bit with it, I immediately had the idea of writing a deployment tool with it to address everything I was missing in krops. Lollypops (Lollypop Operations) has been in use successfully on all my machines since then and is still evolving.

Why another deployment tool?

There are already numerous deployment tools for NixOS out there, so the question arises why anyone would want to write another one. In short, I have tried most of them and none of them provided the feature set I wanted or had other limitations. Let met go over my deployment tool journey.

NixOps

The first deployment tool anyone will encounter when using NixOS beyond the simple nixos-rebuild switch is probably NixOps. It is more "official" than other choices and well-tested. However, while it sure has its place in other scenarios it is not for me. First up, it is not stateless. It keeps a SQLite database with the status of all machines. This limit deployment to a single central machine. Sure, I could sync the state over multiple machines, but that is an unnecessary complication in my opinion. Also, at least at the time of me using it, the secrets deployment was very rudimentary and only supported clear text secrets which is not optimal and prevents me from pushing my NixOS configurations to a public repository. There are workarounds for this and NixOps can for sure be extended, but together with its stateful nature I don't see the point of going this rout for my use-case.

Colmena

Colmena does a lot of things right. In fact, it seems to be a very nice piece of software and include most of the features I would like to see in a modern NixOS deployment tool. However, there are also some limitations that keep me from switching to it. First off, I don't really like the choice of configuration format for deployments.

Instead of putting the deployment options in the configuration of the host itself, Colmena requires a separate flake output

{
  colmena = {
    meta = {
      nixpkgs = import nixpkgs {
        system = "x86_64-linux";
      };
    };
    host-a = { name, nodes, pkgs, ... }: {
      boot.isContainer = true;
      time.timeZone = nodes.host-b.config.time.timeZone;
    };
    host-b = {
      deployment = {
        targetHost = "somehost.tld";
        targetPort = 1234;
        targetUser = "luser";
      };
      boot.isContainer = true;
      time.timeZone = "America/Los_Angeles";
    };
  };
}

I find this non-standarized flake output suboptimal. For example having all deployment options for all hosts in a central output makes it complicated to move a host to a different flake. A more elegant solution in my opinion would be to make the deployment and secrets management part of the host itself (part of nixosConfigurations.<hostname>).

The second and probably more deciding factor for me is complexity. This might be highly subjective, but just like NixOps Colmena consists of a substantial amount of (in this case Rust) code, that does a lot behind the scenes. It might not be a problem in itself if you are willing to read it all or don't run into cases where you have to debug much, but it is complexity that is not needed to perform the tasks we are trying to accomplish in my opinion. The configurations of my systems and infrastructure is something I want to be able to understand to its last detail and this seems like an unnecessary source of complexity which besides debugging makes building on it and extending problematic. This aspect is on that has led me to the last tool compared here, krops, which I was using before.

Krops

What is actually needed to deploy a NixOS system? In general, you will in most cases:

Decrypt/fetch and copy secrets to the target
Run nixos-rebuild switch
Optional, other steps like copying over the configuration

If you are thinking, you could do this with a simple bash script for each host, you are right. Krops is a tool that will generate that script for you from a configuration file written in nix.

This means in practice, that you configure your systems and get a nix file with multiple outputs, each one corresponding to a host or a group of hosts. To deploy you then run

# Generatee deployment script
nix-build ./krops.nix -A my-hostname 

# Deploy to the host
./result

This works very well and has minimal overhead. If something goes wrong you can always just look into the generated ./result which is just a shell script and execute the commands in there manually to find what is causing the problem. Also, the overhead is minimal. In fact, it works so well that this is what I have been using for multiple machines for quite some time now.

However, there are some drawbacks: For once it is again a separate configuration for the deployment. Even more so than with Colmena, because it even is a separate file. Krops was not build with flakes in mind and while it surely works with it, it would be much nicer to have it as a flake output. This is not possible without forking it, as there are non-pure statements in the code which are not allowed in a flake build.

A more serious drawback is that it is not parallel at all. You can find hacky workarounds to execute multiple copies of it at once, but in general every host has to be build after each other in sequential order. That is only acceptable if you have a few hosts or a lot of patience. Lastly, the secrets management is build to pass. I like that password manager, but I would prefer to be able to use any secrets manager I like and plug it into the deployment tool. The secrets management itself is quite rudimentary, they get just copied with root-only permissions to a specified directory, from there it's your task to set the correct permissions or do with them whatever you want.

Lastly, of course there is a certain element of interest in writing this for the sake of learning how to do it.

Requirements

Let's recap what I am looking for and specify some requirements. This list is not exhaustive, but contains the key points I want to address with existing solutions and what is crucial to be useable for me.

Stateless: More specifically storing data on the machine that initializes the deployment (I'll call it the local host) is undesirable. The state should be on the host where we deploy to (the remote host).
Parallel: There should be the option to deploy to multiple hosts in parallel or do various steps on one host in parallel. Of course, we have to think about dependencies when running tasks asynchronously too.
Pluggable secrets backend: I've been using pass for now, but would like the option to use any backend or manager that has a CLI interface or other kind of API. Some interesting options would be Hashicorp Vault for larger setups and Bitwarden via it's CLI tools for personal stuff.
Flakes-first configuration: Flakes are here to stay. The tools should support them as first-class citizen and make use of their outputs. This includes .apps for running the tool.
Deployment options in the host's configuration: The deployment and specification of secrets are part of the configuration of the host and should therefore be in it's NixOS configuration
Thin and simple: For a lack of better title, I want as much code to be Nix code itself and the tool to be easily understandable. Looking at the source should explain its function for anyone who has basic understanding of Nix and NixOS.
Debuggable: Not only should it be simple to understand what is happening, but I want good debugging capabilities. This includes showing all the commands that are being run, the ability to control output and the ability to run individual steps separately. Also, it should always be possible to still run nixos-rebuild on the remote host itself in case we can't deploy from another host or need remote and local to be the same machine.
Extensible: It makes only sense to think about extensibility. While all steps should have sensible defaults, I'd like to make them configurable. It should be possible to run additional commands before, during and after the deployment. This could be useful to do things like hardware provisioning with terraform or other tools.
Performant, Secure, Maintainable: The usual stuff, should be self-explanatory.

About go-task/task

https://taskfile.dev/ Task is a task runner / build tool that aims to be simpler and easier to use than, for example, GNU Make. Since it's written in Go, Task is just a single binary and has no other dependencies, which means you don't need to mess with any complicated install setups just to use a build tool.

Similar to a Makefile the steps to be run are specified in a file, in this case yaml. The syntax is quite self-explanatory if you have used any form of yaml-based provisioning or CI. A simple Taskfile can look like this:

version: '3'

tasks:
  build:
    cmds:
      - go build -v -i main.go

  assets:
    cmds:
      - minify -o public/style.css src/css

There is also support for templating and more complex syntax structures, but we'll be using Nix for most of the configuration.

Basic Concept

While the YAML format has its limitations (e.g. The Norway Problem) and there are more expressive formats out there, it has undoubtedly become an industry standard. So much in fact, certain professions are starting to be jokingly called "Yaml Engineers".

For NixOS this has been a blessing, even though it was never a consideration. YAML is (99.99% of the time) a strict superset of JSON. Yes, JSON is valid YAML.

Mix this with the built-in Nix function toJSON and you have a pretty universal and effective way to generate configuration from Nix code for any application that understands YAML.

Lollypops allows deploying two things: The system configuration and secrets. I will use this to make both part of NixOS hosts configuration and generate a Taskfile.yml from it dynamically, which go-task can execute.

Flakes first

Since I only use flake-based setups, Lollypops assumes this new style of NixOS configuration. In contrast to Colmena and others, I won't create an additional flake output though, it is all part of the nixosConfigurtions.

nixosConfigurations = {
  host1 = nixpkgs.lib.nixosSystem {
    system = "x86_64-linux";
    modules = [
      lollypops.nixosModules.lollypops # Import the lollypops module
      ./configuration.nix # Other configuration
      {
        lollypops = { # Here goes our config for the host
          deployment = { /* Deployment configuration */ };
          secrets = { /* Secrets configuration */ };
        };
      }
    ];
  };
};

From here, it is "just" a matter of generating a Taskfile.yaml from the options set and providing a flake app that can run it.

Implementation

Let's start with what the user will face: The final executable. This is for now astonishingly simple. In fact this is all that the app output does, it runs go-task from nixpkgs with a specified taskfile passed as first parameter and any following arguments from the user after that.

{
  drv = pkgs.writeShellScriptBin "go-task-runner" ''
    ${pkgs.go-task}/bin/task -t ${taskfile} "$@"
  '';
};

The taskfile is of course where most of the magic happens. In total, we want to generate a Taskfile.yml with the following tasks:

For each host:

Copy the configuration (flake) to it
Copy the secrets to it
Run the rebuild for the host

In fact, this is exactly what Lollypops generates. This is an excerpt for two hosts called porree and ahorn, shortened for clarity:

{
  "includes": {
    "ahorn": { "taskfile": "/nix/store/1vgrqx3gwd1srnzv1db3q6zi7sjsy0j0-CommonTasks.yml" },
    "porree": { "taskfile": "/nix/store/vdprj4fvd3ndyp7mdm8qgmif15n14bf2-CommonTasks.yml" }
  },
  "output": "prefixed",
  "silent": true,
  "version": "3",
  "tasks": {
    "ahorn": {
      "desc": "Provision host: ahorn",
      "cmds": [
        { "task": "ahorn:deploy-flake" },
        { "task": "ahorn:deploy-secrets" },
        { "task": "ahorn:rebuild" }
      ]
    },
    "porree": {
      "desc": "Provision host: porree",
      "cmds": [
        { "task": "porree:deploy-flake" },
        { "task": "porree:deploy-secrets" },
        { "task": "porree:rebuild" }
      ]
    },
    "all": {
      "deps": [
        { "task": "ahorn" },
        { "task": "porree" }
      ]
    }
  }
}

Apart from the steps outlined above, it makes use of some of Go-Task's features: By using dependencies and the include statement, most of the steps can be reused with parameters. In addition to the two hosts, there is a special all tasks, which, you guessed it, runs all of them.

But what is in that included CommonTasks.yml file? This is where the actual script is generated. It consists mainly of 4 sections:

Set environment variables used as parameters
Task with generated script to copy over flake
Task with generated script to copy over secrets
Task with nixos-rebuild command to rebuild

The variables are used to map the options created by the NixOS module to environment variables in the syntax of Go-Task. For example this the modules option

host = mkOption {
  type = types.str;
  default = "${config.networking.hostName}";
  description = "Host to deploy to";
  defaultText = "<config.networking.hostName>";
};

Is used to populate the environment variable HOSTNAME which will be used in various steps of the task as {{.HOSTNAME}}.

Copy flake

Copying over the flake is not strictly necessary, as we could just use nixos-rebuild with --target-host and be done with it, but I like to have a copy of the flake itself on the machines as it allows manual rebuilding and debugging should you happen to misconfigure the internet connection or for some other reason need to work on the machine locally.

This is the script it generates. You can see the use of the set variable in Go-Task's templating format

source_path={{.LOCAL_FLAKE_SOURCE}}
if test -d "$source_path"; then
  source_path=$source_path/
fi
/nix/store/704rbbcw1h45hfziknn4k9havadcls6v-rsync-3.2.7/bin/rsync \
--checksum \
--verbose \
-e ssh -l {{.REMOTE_USER}} -T \
-FD \
--times \
--perms \
--recursive \
--links \
--delete-excluded \
$source_path {{.REMOTE_USER}}@{{.REMOTE_HOST}}:{{.REMOTE_CONFIG_DIR}}

Rebuild

The rebuild itself is the simplest part. It is possible to use nixos-rebuild with the --target-host option for remote provisioning, but I want to build on the host itself in most cases. This is configurable, but the default is just a simple nixos-rebuild command executed via SSH.

ssh {{.REMOTE_USER}}@{{.REMOTE_HOST}} "nixos-rebuild {{.REBUILD_ACTION}} --flake '{{.REMOTE_CONFIG_DIR}}#{{.HOSTNAME}}'"

Secrets management

The deployment of secrets calls a configurable password command (pass used here) and pipes the output via SSH directly to the file where it is supposed to be written. The file is initially created with permissions so that only root can read/write it. If other permissions are configured they are applied in a last step.

The process is repeated for every secret that is found to be configured in the lollops.secrets.files option.

set -o pipefail -e; 

# Create directory if it does not exist
ssh {{.REMOTE_USER}}@{{.REMOTE_HOST}} 'umask 077; mkdir -p "$(dirname '/var/src/lollypops-secrets/alertmanager-ntfy/envfile')"'

# Run pass command and pipe output
/nix/store/0fx35js67m58llyyawf039880i1a7zbw-password-store-1.7.4/bin/pass \
nixos-secrets/porree/alertmanager-ntfy/envfile | ssh {{.REMOTE_USER}}@{{.REMOTE_HOST}} "umask 077; cat > '/var/src/lollypops-secrets/alertmanager-n tfy/envfile'" 

# Set permissions
ssh {{.REMOTE_USER}}@{{.REMOTE_HOST}} "chown root:root '/var/src/lollypops-secrets/alertmanager-ntfy/envfile'"

It would of course possible to simplify the script, e.g. by writing to a local temporary file and copying it over. I chose this approach in order to avoid having either a local file with the secret or a remote file with the secret with wrong permissions at any point in time. Even during rebuild there should be no possibility to steal it by other users on the local or the remote side.

User interface and interaction

First things first: Since all we are doing under the hood is use Go-Task's CLI with a generated Taskfile.yml, we get the benefit of having all the options it provides "for free". When executing, all arguments on the command line after -- will be passed to Go-Task instead of being treated as nix parameters.

nix run '.' --  --help
Options:
  -c, --color                       colored output. Enabled by default. Set flag to false or use NO_COLOR=1 to disable (default true)
  -C, --concurrency int             limit number tasks to run concurrently                                                          
  -d, --dir string                  sets directory of execution                                                                    
  -n, --dry                         compiles and prints tasks in the order that they would be run, without executing them         
  -x, --exit-code                   pass-through the exit code of the task command                                               
  -f, --force                       forces execution even when the task is up-to-date                                           
  -h, --help                        shows Task usage                                                                           
  -i, --init                        creates a new Taskfile.yaml in the current folder                                         
  -I, --interval string             interval to watch for changes (default "5s")                                             
  -l, --list                        lists tasks with description of current Taskfile                                        
  -a, --list-all                    lists tasks with or without a description                                              
  -o, --output string               sets output style: [interleaved|group|prefixed]                                       
      --output-group-begin string   message template to print before a task's grouped output                             
      --output-group-end string     message template to print after a task's grouped output                             
  -p, --parallel                    executes tasks provided on command line in parallel                                
  -s, --silent                      disables echoing                                                                  
      --status                      exits with non-zero exit code if any of the given tasks is not up-to-date        
      --summary                     show summary about a task                                                       
  -t, --taskfile string             choose which Taskfile to run. Defaults to "Taskfile.yml"                       
  -v, --verbose                     enables verbose mode                                                          
      --version                     show Task version                                                            
  -w, --watch                       enables watch of the given task

The most interesting ones for us are probably the concurrency and dry run options, which allow provisioning multiple hosts at once in parallel or dry-running for debug runs.

Let see what tasks we have available. Using --list-all with a flake with two nixosConigurations (ahorn and porree) I get the following options:

nix run '.' -- --list-all

* ahorn:                          Provision host: ahorn
* all:
* porree:                         Provision host: porree
* ahorn:check-vars:
* ahorn:deploy-flake:             Deploy flake repository to: ahorn
* ahorn:deploy-secrets:           Deploy secrets to: ahorn
* ahorn:rebuild:                  Rebuild configuration of: ahorn
* porree:check-vars:
* porree:deploy-flake:            Deploy flake repository to: porree
* porree:deploy-secrets:          Deploy secrets to: porree
* porree:rebuild:                 Rebuild configuration of: porree

Usage examples

The tasks are organized in to a tree-like structure, which makes it easy to reason about them. Running a task will also run all of its dependencies, which allows provision all hosts at once, a single host, a single task or any combination for those.

 all
  ├── ahorn
  │  ├── check-vars
  │  ├── deploy-flake
  │  ├── deploy-secrets
  │  └── rebeild
  └── porree
     ├── check-vars
     ├── deploy-flake
     ├── deploy-secrets
     └── rebuild

A few examples. Deploy everything:

nix run '.' -v --show-trace  -- all

Execute two tasks of a host, in this case copying the flake over and rebuilding, without re-provisioning the secrets

 nix run '.' -v --show-trace  -- ahorn:deploy-flake ahorn:rebuild

Provision two hosts in parallel

 nix run '.' -v --show-trace  -- ahorn porree --parallel

You get the idea.

Possible extensions and more

I've been using the tool myself daily for a few months now, and it does everything I need currently. Still, I have a few ideas for features I'd like to add at some point.

One of the strengths is endless extensibility, for example it would be possible to add more steps as pre-rebuild and post-rebuild commands, that can be configured to do something like provisioning VPS with terraform.

For a more complex example, have a look at my configuration files and browse grep for lollypops to find many examples of how to use it, including secrets deployment. And of course, browse through the code of Lollypops. Additional options and documentation is over there.

Lollypops: Let's build a deployment tool!