hydra
                                
                                 hydra copied to clipboard
                                
                                    hydra copied to clipboard
                            
                            
                            
                        Improve UX with CA derivations
Fix #838
There should be a couple of UI artifacts (because the output paths were showing up in a couple places but we don't know them anymore), but everything seems to be working fine otherwise (I managed to build a couple generations of the nixpkgs-minimal jobset with this).
~~Depends on #874 (could probably be made to work without it, but I built it on top of it and the effectively touch the same code)~~
~~Also depends on NixOS/nix#4477~~ Depends on #1349
This pull request has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/tweag-nix-dev-update-8/11758/1
This pull request has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/tweag-nix-dev-update-10/12587/1
I should have done this as a review, sorry.
In the C++ I'm seeing a lot of in-place complication. Would it be possible to clean up the campsite a bit, and make some functions and break up the behavior a bit? I'm increasingly uncertain about understanding those pieces of the code.
I also wonder if there would be anything interesting in tests for:
- requesting the Builds page. It is fairly easy to test controllers using https://github.com/NixOS/hydra/blob/master/t/Controller/Jobset/channel.t#L15-L16 :` https://github.com/NixOS/hydra/blob/master/t/Controller/Jobset/channel.t#L36
- what happens when trying to build a content-addressed derivation but CA is not enabled?
- what happens when building a CA derivation but its already been built and pushed to a binary cache? Here is a test that checks similar behavior: https://github.com/NixOS/hydra/blob/master/t/queue-runner/notifications.t
- how does Hydra handle it if Hydra is ca-enabled but its builders are not?
Thanks for the review @grahamc
In the C++ I'm seeing a lot of in-place complication. Would it be possible to clean up the campsite a bit, and make some functions and break up the behavior a bit? I'm increasingly uncertain about understanding those pieces of the code.
A large part of the complication is annoyingly due to the fact that we need to support both the nix+ca-derivations and nix-without-ca-derivations cases, which means two slightly different code paths. But it should be indeed possible to factor at least part of it out.
I also wonder if there would be anything interesting in tests for:
requesting the Builds page. It is fairly easy to test controllers using https://github.com/NixOS/hydra/blob/master/t/Controller/Jobset/channel.t#L15-L16 :` https://github.com/NixOS/hydra/blob/master/t/Controller/Jobset/channel.t#L36 what happens when trying to build a content-addressed derivation but CA is not enabled? what happens when building a CA derivation but its already been built and pushed to a binary cache? Here is a test that checks similar behavior: https://github.com/NixOS/hydra/blob/master/t/queue-runner/notifications.t how does Hydra handle it if Hydra is ca-enabled but its builders are not?
I’d have to think it a bit more, but I think there's a number of things worth testing here indeed.
I've added a bunch of tests (and fixed a bunch of related issues by the way).
I didn't test
3. what happens when building a CA derivation but its already been built and pushed to a binary cache? Here is a test that checks similar behavior: https://github.com/NixOS/hydra/blob/master/t/queue-runner/notifications.t 4. how does Hydra handle it if Hydra is ca-enabled but its builders are not?
because
- substituting ca derivations isn't supported yet in hydra (and doing so will probably require a bit of work)
- A “mixed” network of machines (some ca-enabled, some not) isn't supported by Nix atm (I've just opened https://github.com/NixOS/nix/issues/4755 to track it)
The test hydra instance just showed some weird failures with fixed-output derivations (https://hydra.ngi0.nixos.org/build/171 − the cbindgen vendor derivation should build just fine but is marked as failed without anything appearing in the log). This needs a bit of investigation − Might be linked to https://github.com/NixOS/hydra/pull/875#discussion_r622156625 because maybe that bit of code also rewrites the output hashes of FO derivations.
The test hydra instance just showed some weird failures with fixed-output derivations (https://hydra.ngi0.nixos.org/build/171 − the cbindgen vendor derivation should build just fine but is marked as failed without anything appearing in the log). This needs a bit of investigation − Might be linked to #875 (comment) because maybe that bit of code also rewrites the output hashes of FO derivations.
OK, looks like these are “legitimate” has mismatches. So the only issue is that the error isn’t properly reported − which I think is a Nix issue
It would be really great if whatever the underlying issue is has an accompanying test wherever the fix belongs :) these sort of mystery problems are exhausting 😔
On April 29, 2021, GitHub @.***> wrote:
The test hydra instance just showed some weird failures with fixed- output derivations (https://hydra.ngi0.nixos.org/build/171 − the cbindgen vendor derivation should build just fine but is marked as failed without anything appearing in the log). This needs a bit of investigation − Might be linked to #875 (comment) https://github.com/NixOS/hydra/pull/875#discussion_r622156625 because maybe that bit of code also rewrites the output hashes of FO derivations.
OK, looks like these are “legitimate” has mismatches. So the only issue is that the error isn’t properly reported − which I think is a Nix issue
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NixOS/hydra/pull/875#issuecomment-829117758, or unsubscribe <https://github.com/notifications/unsubscribe- auth/AAASXLE7UTSXLSFPVILDEATTLEYFRANCNFSM4YCDMVTQ>.
This pull request has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/tweag-nix-dev-update-11/12886/1
Possible issue with remote builders and hydra. (note: there should be no CA builds occurring) Using hydra:
 src = pkgs.fetchFromGitHub {
    owner = "regnat";
    repo = "hydra";
    rev = "2090d3a0899db4b63d3fbf93545be0020385a165";
    sha256 = "sha256-GrYR/xX/kWxxB0VoHKj0c3Pw7A6RLPexDkVNNVa7kn4=";
};
Getting errors like:
possibly transient failure building ‘/nix/store/XYZ.drv’ on ‘[email protected]’: error: dependency '/nix/store/3fs5wqlj9ipv7wjx4av81skmpcdjjh3d-jq-1.6-dev' of '/nix/store/XYZ.drv' does not exist, and substitution is disabled
Reverting to package = hydra-unstable; restores behavior as expected with regards to remote building. All builds have __contentAddressed = false. Any thoughts, leads? I attempted setting use-substitutes, useSubstitutes, and builders-use-substitutes to no effect. Only solution was to revert.
Getting errors like:
possibly transient failure building ‘/nix/store/XYZ.drv’ on ‘[email protected]’: error: dependency '/nix/store/3fs5wqlj9ipv7wjx4av81skmpcdjjh3d-jq-1.6-dev' of '/nix/store/XYZ.drv' does not exist, and substitution is disabled
We’ve also started getting the same on https://hydra.ngi0.nixos.org/build/202 .
That might (quite ironically) be related to https://github.com/NixOS/hydra/pull/875/commits/d7a1b6b982bf64e560998ee6d041d0da66e518c8
Running this branch now with ca-derivation builds, but getting errors in the UI like this: Caught exception in Hydra::Controller::Build->build "[31;1merror:[0m path '[33;1m/1vbp723njr8wq6m404hzkfmqck0vk73gcgr6pjc78il65mh6p9mn[0m' is not in the Nix store at /nix/store/62znwr88yc5mi5v0qxmzp9rd84km68q6-hydra-2021-05-03/libexec/hydra/lib/Hydra/Controller/Build.pm line 73.". Seems to be building anyway though, which wasn't the case with hydra-unstable.
Not sure if that's to be addressed in a future patch or if I hit some edge case.
Edit: I made a new jobset and now it seems solved (for that new jobset).
@tomberek can you try 854dcec ? I think it should fix your issue.
@Mindavi there were some known issues with earlier commits in this branch where the database wasn’t properly filled with all the required informations, causing this kind of issue (and which persisted after an upgrade because the incomplete data was still there). Any chance that’s the case for you?
I guess I broke it with trying to build with ca-derivations first and only then finding out hydra doesn't support it and applying thid patchset.
It seems to be ok now though, with a fresh jobset, apart from the (already expected) ui artifacts (notably, empty package names).
This has now been running pretty smoothly for a while for me. However, there's one thing that's now starting to happen:
    Aborted: error: Trying to register a realisation of 'sha256:5a7f41bac1b8abb186eb4f0ec1a76ee90809fabbdd620bde7ea0ddb66cb96e6a!out', but we already have another one locally.
      Local: /nix/store/2489p9qk8i8wi627maxix9w15li2y890-systemd-247.6
      Remote: /nix/store/dvbv3d914ii22vnr277bslzq8lnwz68x-systemd-247.6
I already have systemd built locally, but it appears to be trying to substitute it anyways? And the remote appears to have another hash as well. I'm not sure what's happening here, but it shouldn't be trying to substitute it, I guess.
This has happened for some different packages over the last few days, e.g. also for glibc.
I have the binary cache for the ngi0 machine that's building the ca derivations enabled, I guess those might also have something to do with this?
Versions:
- nix (Nix) 2.4pre20210707_02dd6bb
- hydra: d20639a322e340770c9672e0181e70e48214a21c (latest commit on this PR as of today)
The issue where realisations cannot be registered might be related to separateDebugInfo (or more specifically, the --build-id flag), since I can imagine that that can cause ca derivations to be mismatched, while otherwise the inputs appear the same. Not 100% sure how it's generated though, so this may be a red herring.
systemd does not have separateDebugInfo enabled, but uses the flag for itself to build this file: /lib/systemd/boot/efi/linuxx64.elf.stub.
Since nix tries to rebuild even if I try to do a build manually, I don't think this is a hydra-only problem.
It seems that this issue occurs mostly by building staging regularly. I'll have to find 2 versions of nixpkgs which will clearly reproduce this to help debug this issue at some point.
Should we get this merged and working with 2.4?
I've run this for a while and even though it's not perfect, it does seem quite ok. Getting this in the main codebase may ensure that it doesn't get too stale, since it now needs a rebase with lots of subtle changes (since I'm not familiar with the codebase it was too hard to get right for me). It was not stable enough for me to keep running, since it kept getting stuck due to the issue described above, where a realisation couldn't be registered. The only way I found to solve that was to do a garbage collection, which is not great.
I think getting this in may also lead to more people testing it out and ironing out the last issues (no idea how many people run hydra though). I wouldn't say that it's wise to run the whole of nixpkgs on this yet, but it is good enough for a first cut IMO.
More testers would be nice, but it doesn't seem like that's going to happen on the short term, and this will get more stale over time.
Depends on how much time the author has, of course :).
Thanks for the reminder @Ericson2314 @Mindavi .
I’ve merged it with master and cleaned a couple things to make the changes (sligtly) easier to grasp (partially answering https://github.com/NixOS/hydra/pull/875#issuecomment-828411639).
@grahamc @edolstra it would be awesome if you could find some bit of time to give this a look :)
Pinging again :) Really hope we can see this merged, and then discuss QAing Nixpkgs!
Needed this with nix-2.6.0pre20220117_fc2443a.
@edolstra @thufschmitt @grahamc we should do...something about this!
Hydra now uses Nix 2.6 so this can shrink a little bit on rebase/merge
This pull request has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/tweag-nix-dev-update-27/18433/1
@thufschmitt I 'm really interested into getting this upstream and I'm available to work on it. For now I tried downloading your branch (nix-ca, what is is nix-ca2 instead?) and updgrading to nix 2.8.1 and it worked.
My idea was to rebasing it into the current master (or the opposite is bette?). Is there anything I need to know? What do you suggest?
@aciceri that's very good news. I'll try to schedule a call with you to discuss it if that's OK so that I can give you an overview of the current state (as much as I remember, it's been a while 😬)
I am planning to setup a hydra with this branch in the next weeks. I don't really have good knowledge in the languages used nor do I have to much spare time for yet another project but I just wanted to let you know that all your works are highly appreciated and people are looking with great anticipation on ca.
Closing in favor of #1228
Reopening because I can push to this one.