new deployment infra: CLOUD deployment backend for workbench
Integration with the infrastructure needed for P2P benchmarks
This issue is about the CLOUD deployment, for the LOCAL deployment see #3971. The parent issue of both is #3969.
Description
Right now the workbench uses NixOS Services to run a cluster of cardano-nodes. Due to limitations of the current infrastructure used for cloud deployments, to be able to run a cluster where we can benchmark with P2P enabled on the cloud, the cluster needs to be run using OCI/Docker images (infrastructure limitation? or proposed solution?). The standard way of doing this is to use compose.yaml / docker-compose to describe/run the cluster, this will be done using the images provided by the cardano-world repository.
New approach: Nomad (Documentation still in progress)
- [ ] Generate genesis -> upload to S3 -> path job.json with
artifact->nomad job run - [ ] Tracer-Node architecture, move from 1-1 to 1-N
- This is not a problem for local deployments because a shared folder containing the Tracer socket can be mounted but cloud environments are currently being tested with one
one_tracer_per_node=true, that means both running in the same Task/container/image. - To switch to
one_tracer_per_node=falseNodes need to forward (usually using SSH) a local socket to the Tracer's socket (Tracer does not support plain networking by design) - [ ] Supply the
cardano-nodecommit at runtime - To be able to benchmark any published
cardano-noderepo commit from any version of the workbench. - One option I'm thinking of is to add the Nomad backend bits as a flake output, if not I have to resolve how to have scripts like
start.shthat even reference one very specificbashversion and thebashversion I'm pushing to Nomad using thenix_installablesstanza - [ ] Integrate heartbeats with Nomad
- Add Nomad service checks using
cardano-pingto the job specification so Nomad can handle part of cluster failure checking load - [ ] The workbench, when cloud benchmarking, needs to compile cardano-node or other binaries that it won't run locally?
"Abandoned" approach (kept here just in case or while I clean the docs)
Prerequisites
1. cardano-world
cardano-node should not depend on cardano-world as the former will be archived and merged inside the latter. See mono repo.
2. Use a Compope file (compose.yaml)
The DevOps team proposed two options for the next steps:
- Using plain docker-compose.yaml
- Using Arion to automate further the creation
With the below recommendation:
I'd personally choose to start off with arion. But with docker-compose.yaml (using podman), as a stepping stone, that's completely fine.
2.a. Entrypoints
As explained in packaging principles the entrypoints are part of the 4 layers of Packaging.
The entrypoints refer to the command that the OCI container runs when it starts. We are going to have an special case for benchmarking on the script that comes preinstalled with the cardano-node container image. See here.
2.b. Parameters
The parameters for the cardano-node executable inside the container are passed using the compose.yaml, by defining the attributes volumes and either env_file and/or environment.
How to build the OCI images using cardano-world
1. Docker
Installing docker
$ nix-env -i docker
If you try to run the docker daemon from a nix-shell you have to use dockerd-rootless that gives the following error:
[rootlesskit:parent] error: failed to setup UID/GID map: newuidmap 677055 [0 1000 1 1 100000 65536] failed: newuidmap: write to uid_map failed: Operation not permitted
: exit status 1
Creating the image
$ gt clone https://github.com/input-output-hk/cardano-world
$ cd cardano-world
$ nix run .#x86_64-linux.cardano.oci-images.cardano-node.copyToDockerDaemon
This creates a local image with the cardano-node executable:
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
registry.ci.iog.io/cardano-node bn8rapg74yhmd530id4qwzzdcbiz63cr 45d01dc8b973 N/A 606MB
The problem with using the Docker image created locally it that docker-compose has a bug and always tries to fetch the image from the cloud registries and to avoid that a registry service must be run locally (and register the image with the new local registry). See here.
2. Podman
Installing Podman
$ nix-env -i podman
$ podman images
Error: could not find a working conmon binary (configured options: [/usr/libexec/podman/conmon /usr/local/libexec/podman/conmon /usr/local/lib/podman/conmon /usr/bin/conmon /usr/sbin/conmon /usr/local/bin/conmon /usr/local/sbin/conmon /run/current-system/sw/bin/conmon]): invalid argument
$ nix-env -i podman conmon runc
$ podman images
Error: default OCI runtime "runc" not found: invalid argument
$ nix-env -i podman conmon runc
$ podman images
Error: command required for rootless mode with multiple IDs: exec: "newuidmap": executable file not found in $PATH
$ nix-env -i podman conmon runc shadow
$ podman images
ERRO[0000] running `/home/fmaste/.nix-profile/bin/newuidmap 113223 0 1000 1 1 100000 65536`: newuidmap: write to uid_map failed: Operation not permitted
Error: cannot setup namespace using "/home/fmaste/.nix-profile/bin/newuidmap": should have setuid or have filecaps setuid: exit status 1
Here is Red Hat's explanation of Rootless containers with Podman.
There are many steps to setup a rootless environment. See Basic Setup and Use of Podman in a Rootless environment. I can't make it work.
Creating the image
$ gt clone https://github.com/input-output-hk/cardano-world
$ cd cardano-world
$ nix run .#x86_64-linux.cardano.oci-images.cardano-node.copyToPodman
Others
$ nix build .\#x86_64-linux.cardano.oci-images.cardano-node.copyToPodman
Resulting script:
$ cat result/bin/copy-to-podman
#!/nix/store/pbfraw351mksnkp2ni9c4rkc9cpp89iv-bash-5.1-p12/bin/bash
echo "Copy to podman image registry.ci.iog.io/cardano-node:ncn87201lsk14v3asizb9zgqlx70gc4f"
/nix/store/366b4jgvhsqzwzzljrqnig6zs6wck6jv-skopeo-1.5.2/bin/skopeo --insecure-policy copy nix:/nix/store/ncn87201lsk14v3asizb9zgqlx70gc4f-image-cardano-node.json containers-storage:registry.ci.iog.io/cardano-node:ncn87201lsk14v3asizb9zgqlx70gc4f
/nix/store/366b4jgvhsqzwzzljrqnig6zs6wck6jv-skopeo-1.5.2/bin/skopeo --insecure-policy inspect containers-storage:registry.ci.iog.io/cardano-node:ncn87201lsk14v3asizb9zgqlx70gc4f
Understanding the path
- Podman: The open-source default container engine used by Red Hat Enterprise Linux that allows you to develop, run, and manage OCI containers on Linux.
- Arion: Arion is a tool for building and running applications that consist of multiple docker containers using NixOS modules.
- Skopeo: is a command line utility that performs various operations on container images and image repositories.
- copyToPodman Nix attribute: nix2container provides an efficient container development workflow with images built by Nix: it doesn't write tarballs to the Nix store and allows to skip already pushed layers (without having to rebuild them).
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 120 days.