hydra
hydra copied to clipboard
Allow building content-addressed derivations with hydra - reprise
This is an attempt to merge #875 by @thufschmitt into current master
.
I don't have much experience with Nix internals nor languages involved here, but I tried my best to understand what I was merging and doing the correct fixes where needed, I hope it makes all sense.
All tests except evaluator/evaluate-oom-job.t
are passing, and that one is also failing on master
on my machine so probably the cause is not related to this PR.
I've been experimenting building some derivations from nixpkgs
and everything seems to work but probably there are some corner cases which I've not considered.
@Ericson2314 @SuperSandro2000 IIRC you were interested into this, maybe you can help me finding these corner cases.
With this and the other PR I couldn't get the builds to work and always got evaluation errors in the following style:
in job ‘kibana’:
error: not an absolute path: 'ARRAY(0x58d7868)/0wfc6i459l68cx55y5iljyk0dnxz8m43-nixos-system-kibana-22.05.20220626.cd90e77.drv'
in job ‘stream’:
error: not an absolute path: 'ARRAY(0x58d7868)/xlqjavnr9spp83vpa0xvfa82srq3lhq3-microvm-cloud-hypervisor-stream.drv'
in job ‘gitea’:
error: not an absolute path: 'ARRAY(0x58d7868)/q1al5iqqhszrz86bl2aq4gngywhx4l4f-microvm-cloud-hypervisor-gitea.drv'
Also I get a wild ARRAY(0x5bdc4f8)
text in the top left corner.
Could we split out the DB/Frontend code into a separate PR? This way we could have that earlier.
On that note, why can path
be NULL now? Doesn't this mean I can get builds where there is no path under "Details"? That doesn't seem right.
Also, would it be possible to differentiate between a build that wasn't performed because it was cached input-adressed and output-adressed (I hope I got the wording right here…)
Could we split out the DB/Frontend code into a separate PR? This way we could have that earlier.
I'm not sure how, but definitely need to split that PR indeed
On that note, why can path be NULL now? Doesn't this mean I can get builds where there is no path under "Details"? That doesn't seem right.
Yes, that's a bit of a hack. IIRC that's needed because the BuiltOutputs
are created before the path is built, so we don't know the output paths yet. We fill it afterwards once the build is done. In practice I think that the database never really sees the empy path
though because everythong happens in a single transaction that gets rolled back if the build doesn't succeed.
would it be possible to differentiate between a build that wasn't performed because it was cached input-adressed and output-adressed (I hope I got the wording right here…)
Yes, that would be a great thing to have. Currently Hydra doesn't know about the cut-off builds (as far as it is concerned, they are freshly built paths), but we'll need to give it that knowledge at some point
With this and the other PR I couldn't get the builds to work and always got evaluation errors in the following style:
Strange :thinking: Can you get that on a fresh hydra instance?
@dasj When you build CA derivations substantially you see empty paths only for pending steps:
When the final derivation get built everything looks normal:
The only exception that I found is when you close Hydra during a build, when you restart some steps are shown as aborted and they are (obviously) still with empty paths:
@SuperSandro2000 if your errors occur even in a fresh install could you please provide a minimal example replicating that errors?
Wouldn't it make sense to display the drv paths, then? This way we don't have the ugly comma thingy we have right now
Wouldn't it make sense to display the drv paths, then? This way we don't have the ugly comma thingy we have right now
Totally, I just got lazy in the initial implementation :stuck_out_tongue:
Can you get that on a fresh hydra instance?
The hydra instance is in a nixos container with a fresh database from thufschmitt's PR. It could be that the shared daemon is making trouble.
@SuperSandro2000 if your errors occur even in a fresh install could you please provide a minimal example replicating that errors?
https://gitea.c3d2.de/c3d2/nix-config/src/branch/master/hosts/hydra/hydra.nix#L4-L40
@grahamc @edolstra @dasJ (randomly pinging the people who seem to be merging here :D ) I would really like this to get through one way or another, but I kinda get that the PR as it is is no fun at all to review. I think it could be split in smaller easier-to-understand PRs (and hopefully easier to get to a clean state where they also improve the state of the codebase), but that would require a non trivial bit of work (esp. if we end-up having to maintain a big graph of interdependent PRs), so it would be great to know whether that's worth the extra work. So my question is: Assuming we can digest that into smaller PRs, will you be able/willing to do the review work that comes with it?
I'd love to have this feature in Hydra and having it well-integrated. Problem is that I'm not too good with the C++ code and the daemon protocol so I'm not too sure how good I can review these. That's also why I asked to split out the frontend and db code because that I know very well. Also cc @ajs124 because you can merge as well
Thanks to @dasJ (who helped me a lot with Perl though he also raised the issue 😅) I improved the UI. Non-CA derivations are shown exactly as before, instead for CA derivations the following is shown:
While it's building:
^ Note that here the usual line "Output store paths" is not shown.
After:
^ Now that line it's shown again.
In the end I chose to add a new row for the step outputs, initially I tried doing this querying the store from Perl to understand if a derivation was CA, I think that this way is much cleaner.
This pull request has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/tweag-nix-dev-update-33/20048/1
in job ‘kibana’: error: not an absolute path: 'ARRAY(0x58d7868)/0wfc6i459l68cx55y5iljyk0dnxz8m43-nixos-system-kibana-22.05.20220626.cd90e77.drv'
~~I think this was caused by having ca-derivations disabled in the nix-daemon due to some priority problem on my end. I think the error message should be improved here and/or errors should properly be handled to prevent other people from running into this.~~
Never mind. I thought it started to evaluate but it somehow ran into a timeout and now I get the same error message again.
Trying this PR out in prod (terrible idea I know) and it seems like there is no database migration for it yet
The following manual migration seems to work for me:
ALTER TABLE BuildOutputs ALTER COLUMN path DROP NOT NULL;
ALTER TABLE BuildStepOutputs ALTER COLUMN path DROP NOT NULL;
ALTER TABLE BuildStepOutputs ADD contentAddressed BOOLEAN NOT NULL DEFAULT 'f';
In my very limited testing it seems to work, including with remote building of regular ~~and CA derivations~~ (which is something that the old patch had issues with) :+1:
Never mind. I thought it started to evaluate but it somehow ran into a timeout and now I get the same error message again.
TBH I have no idea how to debug this (and however I didn't try to replicate it yet).
Trying this PR out in prod (terrible idea I know) and it seems like there is no database migration for it yet
Ooops...sorry, I should have written to not do so, I thought about it but then I forgot. The reason is that I wanted to be sure about the ultimate changes in the tables before. Sorry again.
However I'm super happy to see that there is so much interest into this. I think that tracking everything as comments here is better but I've also created a matrix room if you prefer.
The chat doesn’t seem to be public unfortunately
The chat doesn’t seem to be public unfortunately
Try again please.
However I just added a new "Content addressed" field in the details.
Having success with remote building and this patch: https://github.com/DarkKirb/hydra/commit/b3eedbffd8560b7775130477c2816d6645a9983b
I am not sure if copying step->drv->inputSrcs is needed. I don’t know why copying the derivation outputs back to the machine running hydra-queue-runner needs to be explicit, I think it was working before even though this particular code was not touched?
This pull request has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/how-to-make-nixpkgs-more-eco-friendly-use-less-resources/20976/12
in job ‘kibana’: error: not an absolute path: 'ARRAY(0x58d7868)/0wfc6i459l68cx55y5iljyk0dnxz8m43-nixos-system-kibana-22.05.20220626.cd90e77.drv'
did you find any time to look into or what could I do to debug this?
Using this together with https://github.com/DarkKirb/hydra/commit/b3eedbffd8560b7775130477c2816d6645a9983b, I've experienced hydra-queue-runner silently hanging inside getBuiltOutputs
; turns out I was missing the magic update from 44e1efff7. With these two and some trivial cherry-picking I finally have a hydra that builds content-addressed derivations along with normal ones + can do it on remote builders + uses a recent Nix! Thanks a lot for this updated MR, can't wait for it to land.
Do you have a branch with all of these combined?
https://github.com/t184256/hydra/commits/nix-ca-reprise (d9c0957) is what I ended up running when I mashed it all up together. But consider these just tips for those who want to try this PR in its current form to save them some debugging time; a proper rebase of it onto the latest master would be the a better solution.
This pull request has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/content-addressed-nix-call-for-testers/12881/219
Based on the above comments fron @t184256 I merged this PR into the current master, rebased two patches that no longer applied as https://github.com/SuperSandro2000/hydra/commit/e43b298a57f85ecb89e0a17d1ba11b9dd0871ca5 and https://github.com/SuperSandro2000/hydra/commit/c2685b4bca917c1f6c077e359d55c1b121ee8fe2 and applied the other commits done by @DarkKirb . See https://github.com/SuperSandro2000/hydra/commits/ca-derivations-reprise for full history. I am locally applying it via the following patches I append in an overlay.
patches = [
# ca-derivation support
(let
patch = fetchpatch {
url = "https://github.com/NixOS/hydra/pull/1228.diff";
excludes = [ "t/**" ];
hash = "sha256-wPiMSgETENmFtmITV41/EAo9yar0aPpcZXkQium3zf4=";
};
in runCommand "" {
nativeBuildInputs = [ patchutils ];
} /* bash */ ''
filterdiff -p1 -x src/hydra-queue-runner/build-remote.cc ${patch} > $out
filterdiff -p1 -i src/hydra-queue-runner/build-remote.cc --hunks=x2 ${patch} |
sed -e 's/worker_proto::read(\*localStore\, from\, Phantom<DrvOutputs> {});/WorkerProto::Serialise<DrvOutputs>::read(\*localStore\, rconn);/' >> $out
'')
# based on https://github.com/t184256/hydra/commit/a5ff881c96061841dfc82a264e7dd5201a973512
(fetchpatch {
url = "https://github.com/SuperSandro2000/hydra/commit/e43b298a57f85ecb89e0a17d1ba11b9dd0871ca5.patch";
hash = "sha256-ad/s2tZytsAy1rZkgMAim0JkvoVVSWSIyYERF3eYvts=";
})
# based on https://github.com/t184256/hydra/commit/359479b27948be1543b483979232d0fb84b08330
(fetchpatch {
url = "https://github.com/SuperSandro2000/hydra/commit/c2685b4bca917c1f6c077e359d55c1b121ee8fe2.patch";
hash = "sha256-9mCT+y6fRyAs/O7SKJP6dzhmjnKWEVe3R72TeaP4DFk=";
})
(fetchpatch {
url = "https://github.com/NixOS/hydra/commit/9004ba944a556365d75fc8c45cd6b70c384c1f9a.patch";
hash = "sha256-jPW1tvZejpBTSnPI4xfMj7gPfS7R41XU2YQ/fiPb6As=";
})
(fetchpatch {
url = "https://github.com/DarkKirb/hydra/commit/51c2319f653df748adfbc7a5fa7440205ad884df.patch";
hash = "sha256-jO2kSh/hnkect0IdhauMdsSeMZ10/U/K+xYfrznQ4bA=";
})
];
Bugs I have found so far:
- ~~
/evals
is just empty, nothing is logged~~ I had a faulty nginx location regex - when remote builders are used that do not have ca-derivations enabled, hydra just loops forever trying to use them
Nov 26 03:21:03 hydrogen hydra-queue-runner[1401522]: checking the queue for builds > 0...
Nov 26 03:21:03 hydrogen hydra-queue-runner[1401522]: loading build 10725 (sandro:nixos-config:nixos.hydrogen-simd.system)
Nov 26 03:21:03 hydrogen hydra-queue-runner[1401522]: queue monitor: error:
Nov 26 03:21:03 hydrogen hydra-queue-runner[1401522]: … while loading build 10725:
Nov 26 03:21:03 hydrogen hydra-queue-runner[1401522]: error: experimental Nix feature 'ca-derivations' is disabled; use '--extra-experimental-features ca-derivations' to override
Has someone a patch floating around for fetching .doi's aka https://hydra.example.de/realisations/sha256:a62128132508a3a32eef651d6467695944763602f226ac630543e947d9feb140!out.doi
over a http binary cache?
Found one over at https://github.com/edolstra/nix-serve/pull/21 and ported that to hydra https://github.com/SuperSandro2000/hydra/commit/9e528b2faee8a2bcdb25290ff4fcbc0423c9fbd2
I didn't e2e test this yet but in the browser the json was displayed 🎉
@aciceri is it possible to allow maintainers to push to this branch? Otherwise I can close it an open a new PR.
@Ericson2314 Apparently this is not possible since the fork is owned by an organization and not an user.
If you send me the list of maintainers I can manually give them permissions (to the entire repo I fear, but it wouldn't be a problem since we always used it only for this PR). Feel free to write me at @aciceri:nixos.dev
.
Otherwise I can create a personal fork and allow maintainers to push. But unless I can do this without deleting the PR (but simply editing the branch) it would be useless and you could directly do it.