makes Pipeline support for makes

@kamadorueda Let's consider adding pipeline support to makes so people can run all jobs associated with a pipeline.

Consider parallelism
Consider needs:
Consider artifacts:

Jul 10 '21 01:07 dsalaza4

https://github.com/NixOS/nixpkgs/blob/master/lib/lists.nix#L432

https://github.com/nix-community/home-manager/blob/master/modules/lib/dag.nix#L35

Jul 21 '21 23:07 kamadorueda

it's been working very nice so far, though we are missing the documentation and a way to declare dependencies between jobs

Aug 24 '21 04:08 kamadorueda

Once that's complete, do you plan on implementing a makesd + frontend fully integrated nix based CI?

Put the evaluation cache on a fuseFS and boom - expensive evaluation cost saved for all devs & scalability! :smile: /cc @nrdxp

cheers from bta

/cc @manveru

Sep 06 '21 23:09 blaggacao

@blaggacao from a high-level view Makes is Nix,

Nix builds, basically. And when Nix builds it does so in parallel (but this is not GNU's parallel), parallelization is orchestrated by Nix itself.
At this point results are pushed to a remote binary cache (https://www.cachix.org/), if configured, so future builds in another machines can re-use results

Many tasks end here, we call those calls pure (makeDerivation, makeTemplate derivatives). For example linting stuff, building python/nodeJs packages, etc

Now, some actions/jobs are impure by nature, for instance deploying to a docker container registry or managing secrets. Nix can't do that alone (due to its purity). So what Makes do is cooking a Bash script with Nix that perform the intended action, and then executing such script as a regular OS process

Makes concept is very simple as you can see, (built with nix) + (optionally execute the built artifacts)

The actual job orchestration happens at Gitlab/Github/Jenkins, whatever provider you pick

This issue aims mostly to be able to generate configuration files (gitlab-ci.yml, .github/workflows/xxx.yml, jenkins/jobs/*, etc) for such providers (gitlab/github/jenkins) out of the same configuration you use to configure Makes, so people don't have to maintain two sources of truth. It also makes simple for people to create a CI/CD on the cloud

For the moment, we think this stack choice (Nix -> Makes, Gitlab/Github) is pretty much enough to orchestrate everything a big organization needs (ML, data pipelines, build, test, deploy, infra, k8s clusters, etc) and we are confident of this because we are proof of that (https://gitlab.com/fluidattacks/product)

Once that's complete, do you plan on implementing a makesd + frontend fully integrated nix based CI?

When the issue closes people will be able to execute a full pipeline/workflow with one single command on their computers, but I'm not sure if it'll be so fancy

Put the evaluation cache on a fuseFS and boom - expensive evaluation cost saved for all devs & scalability! smile /cc @nrdxp

We currently put results on (https://www.cachix.org/) so nobody has to build twice!

It can be easily configured in Makes as explained here: https://github.com/fluidattacks/makes#cache
It's pretty scalable
It's free for open source and cost-effective for private projects

cheers from bta

cheers! :)

Sep 08 '21 20:09 kamadorueda

Thank you for your extensive response.

When asking about implications for the eval cache, I actually did not mean the build cache.

The eval cache (at least in flakes) is currently stored in sqlite. One nixpkgs evaluation is reported to take roughly 130MB of RAM.

It probably isnnot first order of business, but something to keep an eye on, since I believe there is a chance to make nix evaluations super responsive, even in bigger projects.

Sep 08 '21 22:09 blaggacao

When the issue closes people will be able to execute a full pipeline/workflow with one single command on their computers, but I'm not sure if it'll be so fancy

That's already super useful! I assume the python script will be the "orchestrator"?

Sep 08 '21 22:09 blaggacao

That's already super useful! I assume the python script will be the "orchestrator"?

The first prototype for this works like this:

You configure the module system: https://github.com/fluidattacks/makes/blob/939a40963d820ad35a296d00cabe43cdce16eae7/makes.nix#L135-L149

You can try the examples below yourself if you want), they are part of the current Makes test suite

Then you run it:

From anywhere: $ m github:fluidattacks/makes@main /pipeline/example
or locally as a developer: /path/to/makes $ m . /pipeline/example

Under the hood:

Nix builds the pure jobs (derivations)
Nix performs a topological ordering of jobs to be executed (pending to implement)
The Makes module system creates a main bash script that representes the entire workflow/pipeline and this main script runs other bash scripts representing the jobs sequentially

$ m . /pipeline/example

Makes v21.10-linux

/nix/store/cxcz6mxw7p0cvp7rygwk39n8vnvxkb1r-pipeline-for-example
[INFO] Running pipeline
[INFO] Running job: /lintNix  # This one is pure, so it's not executed, it was built by nix already
[INFO] Running job: /helloWorld 1 2 3  # This one should be executed, so we do it
[INFO] Hello from Makes! Jane Doe.
[INFO] You called us with CLI arguments: [ 1 2 3 ].

We can do it as fancy as we want, (custom python front-end, makesd, replacing bash with something trailored to do pretty logging, etc)

I'm pending to add docs for this, it's still experimental and under development

Sep 08 '21 23:09 kamadorueda

This is avery nice idea, indeed! I'd like to expand on the first principles of that in the sense that any job DAG very quickly becomes a congested one and we want to schedule it out across available resources (e.g. lambda functions?).

So I can see easily how the nix evaluated dag is used for creating the scheduling metadata of a single "run", but then passes this on as json to the actual scheduler, such as apache airflow.

The likes of zeebe, I think have a special format to instantiate a pipeline, but I think it would be indeed feasible in principle to map that nix-produced-dag into a boostrapped BPMN model that in turn then can be command&controll-ed by zeebe.

Sep 09 '21 16:09 blaggacao

There is also: https://github.com/NixOS/nixpkgs/pull/97392

Sep 10 '21 00:09 blaggacao

https://github.com/willmcgugan/textual
https://github.com/willmcgugan/rich

Oct 01 '21 01:10 kamadorueda

A fellow (in mind) project: https://academic.oup.com/gigascience/article/9/11/giaa121/5987272 (and how it solves pipelines).

Oct 18 '21 14:10 blaggacao