Pipeline support for makes
@kamadorueda Let's consider adding pipeline support to makes so people can run all jobs associated with a pipeline.
- Consider parallelism
- Consider
needs: - Consider
artifacts:
https://github.com/NixOS/nixpkgs/blob/master/lib/lists.nix#L432
https://github.com/nix-community/home-manager/blob/master/modules/lib/dag.nix#L35
it's been working very nice so far, though we are missing the documentation and a way to declare dependencies between jobs
Once that's complete, do you plan on implementing a makesd + frontend fully integrated nix based CI?
Put the evaluation cache on a fuseFS and boom - expensive evaluation cost saved for all devs & scalability! :smile: /cc @nrdxp
cheers from bta
/cc @manveru
@blaggacao from a high-level view Makes is Nix,
- Nix builds, basically. And when Nix builds it does so in parallel (but this is not GNU's parallel), parallelization is orchestrated by Nix itself.
- At this point results are pushed to a remote binary cache (https://www.cachix.org/), if configured, so future builds in another machines can re-use results
Many tasks end here, we call those calls pure (makeDerivation, makeTemplate derivatives). For example linting stuff, building python/nodeJs packages, etc
Now, some actions/jobs are impure by nature, for instance deploying to a docker container registry or managing secrets. Nix can't do that alone (due to its purity). So what Makes do is cooking a Bash script with Nix that perform the intended action, and then executing such script as a regular OS process
Makes concept is very simple as you can see, (built with nix) + (optionally execute the built artifacts)
The actual job orchestration happens at Gitlab/Github/Jenkins, whatever provider you pick
This issue aims mostly to be able to generate configuration files (gitlab-ci.yml, .github/workflows/xxx.yml, jenkins/jobs/*, etc) for such providers (gitlab/github/jenkins) out of the same configuration you use to configure Makes, so people don't have to maintain two sources of truth. It also makes simple for people to create a CI/CD on the cloud
For the moment, we think this stack choice (Nix -> Makes, Gitlab/Github) is pretty much enough to orchestrate everything a big organization needs (ML, data pipelines, build, test, deploy, infra, k8s clusters, etc) and we are confident of this because we are proof of that (https://gitlab.com/fluidattacks/product)
Once that's complete, do you plan on implementing a makesd + frontend fully integrated nix based CI?
When the issue closes people will be able to execute a full pipeline/workflow with one single command on their computers, but I'm not sure if it'll be so fancy
Put the evaluation cache on a fuseFS and boom - expensive evaluation cost saved for all devs & scalability! smile /cc @nrdxp
We currently put results on (https://www.cachix.org/) so nobody has to build twice!
- It can be easily configured in Makes as explained here: https://github.com/fluidattacks/makes#cache
- It's pretty scalable
- It's free for open source and cost-effective for private projects
cheers from bta
cheers! :)
Thank you for your extensive response.
When asking about implications for the eval cache, I actually did not mean the build cache.
The eval cache (at least in flakes) is currently stored in sqlite. One nixpkgs evaluation is reported to take roughly 130MB of RAM.
It probably isnnot first order of business, but something to keep an eye on, since I believe there is a chance to make nix evaluations super responsive, even in bigger projects.
When the issue closes people will be able to execute a full pipeline/workflow with one single command on their computers, but I'm not sure if it'll be so fancy
That's already super useful! I assume the python script will be the "orchestrator"?
That's already super useful! I assume the python script will be the "orchestrator"?
The first prototype for this works like this:
- You configure the module system: https://github.com/fluidattacks/makes/blob/939a40963d820ad35a296d00cabe43cdce16eae7/makes.nix#L135-L149
You can try the examples below yourself if you want), they are part of the current Makes test suite
Then you run it:
- From anywhere:
$ m github:fluidattacks/makes@main /pipeline/example - or locally as a developer:
/path/to/makes $ m . /pipeline/example
Under the hood:
- Nix builds the pure jobs (derivations)
- Nix performs a topological ordering of jobs to be executed (pending to implement)
- The Makes module system creates a main bash script that representes the entire workflow/pipeline and this main script runs other bash scripts representing the jobs sequentially
$ m . /pipeline/example
Makes v21.10-linux
/nix/store/cxcz6mxw7p0cvp7rygwk39n8vnvxkb1r-pipeline-for-example
[INFO] Running pipeline
[INFO] Running job: /lintNix # This one is pure, so it's not executed, it was built by nix already
[INFO] Running job: /helloWorld 1 2 3 # This one should be executed, so we do it
[INFO] Hello from Makes! Jane Doe.
[INFO] You called us with CLI arguments: [ 1 2 3 ].
We can do it as fancy as we want, (custom python front-end, makesd, replacing bash with something trailored to do pretty logging, etc)
I'm pending to add docs for this, it's still experimental and under development
This is avery nice idea, indeed! I'd like to expand on the first principles of that in the sense that any job DAG very quickly becomes a congested one and we want to schedule it out across available resources (e.g. lambda functions?).
So I can see easily how the nix evaluated dag is used for creating the scheduling metadata of a single "run", but then passes this on as json to the actual scheduler, such as apache airflow.
The likes of zeebe, I think have a special format to instantiate a pipeline, but I think it would be indeed feasible in principle to map that nix-produced-dag into a boostrapped BPMN model that in turn then can be command&controll-ed by zeebe.
There is also: https://github.com/NixOS/nixpkgs/pull/97392
- https://github.com/willmcgugan/textual
- https://github.com/willmcgugan/rich
A fellow (in mind) project: https://academic.oup.com/gigascience/article/9/11/giaa121/5987272 (and how it solves pipelines).