hydra icon indicating copy to clipboard operation
hydra copied to clipboard

Improve UX with CA derivations

Open thufschmitt opened this issue 4 years ago • 42 comments

Fix #838

There should be a couple of UI artifacts (because the output paths were showing up in a couple places but we don't know them anymore), but everything seems to be working fine otherwise (I managed to build a couple generations of the nixpkgs-minimal jobset with this).

~~Depends on #874 (could probably be made to work without it, but I built it on top of it and the effectively touch the same code)~~

~~Also depends on NixOS/nix#4477~~ Depends on #1349

thufschmitt avatar Feb 23 '21 09:02 thufschmitt

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/tweag-nix-dev-update-8/11758/1

nixos-discourse avatar Mar 02 '21 14:03 nixos-discourse

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/tweag-nix-dev-update-10/12587/1

nixos-discourse avatar Apr 21 '21 11:04 nixos-discourse

I should have done this as a review, sorry.

In the C++ I'm seeing a lot of in-place complication. Would it be possible to clean up the campsite a bit, and make some functions and break up the behavior a bit? I'm increasingly uncertain about understanding those pieces of the code.

grahamc avatar Apr 28 '21 12:04 grahamc

I also wonder if there would be anything interesting in tests for:

  1. requesting the Builds page. It is fairly easy to test controllers using https://github.com/NixOS/hydra/blob/master/t/Controller/Jobset/channel.t#L15-L16 :` https://github.com/NixOS/hydra/blob/master/t/Controller/Jobset/channel.t#L36
  2. what happens when trying to build a content-addressed derivation but CA is not enabled?
  3. what happens when building a CA derivation but its already been built and pushed to a binary cache? Here is a test that checks similar behavior: https://github.com/NixOS/hydra/blob/master/t/queue-runner/notifications.t
  4. how does Hydra handle it if Hydra is ca-enabled but its builders are not?

grahamc avatar Apr 28 '21 12:04 grahamc

Thanks for the review @grahamc

In the C++ I'm seeing a lot of in-place complication. Would it be possible to clean up the campsite a bit, and make some functions and break up the behavior a bit? I'm increasingly uncertain about understanding those pieces of the code.

A large part of the complication is annoyingly due to the fact that we need to support both the nix+ca-derivations and nix-without-ca-derivations cases, which means two slightly different code paths. But it should be indeed possible to factor at least part of it out.

I also wonder if there would be anything interesting in tests for:

requesting the Builds page. It is fairly easy to test controllers using https://github.com/NixOS/hydra/blob/master/t/Controller/Jobset/channel.t#L15-L16 :` https://github.com/NixOS/hydra/blob/master/t/Controller/Jobset/channel.t#L36 what happens when trying to build a content-addressed derivation but CA is not enabled? what happens when building a CA derivation but its already been built and pushed to a binary cache? Here is a test that checks similar behavior: https://github.com/NixOS/hydra/blob/master/t/queue-runner/notifications.t how does Hydra handle it if Hydra is ca-enabled but its builders are not?

I’d have to think it a bit more, but I think there's a number of things worth testing here indeed.

thufschmitt avatar Apr 28 '21 13:04 thufschmitt

I've added a bunch of tests (and fixed a bunch of related issues by the way).

I didn't test

3. what happens when building a CA derivation but its already been built and pushed to a binary cache? Here is a test that checks similar behavior: https://github.com/NixOS/hydra/blob/master/t/queue-runner/notifications.t

4. how does Hydra handle it if Hydra is ca-enabled but its builders are not?

because

  • substituting ca derivations isn't supported yet in hydra (and doing so will probably require a bit of work)
  • A “mixed” network of machines (some ca-enabled, some not) isn't supported by Nix atm (I've just opened https://github.com/NixOS/nix/issues/4755 to track it)

thufschmitt avatar Apr 28 '21 19:04 thufschmitt

The test hydra instance just showed some weird failures with fixed-output derivations (https://hydra.ngi0.nixos.org/build/171 − the cbindgen vendor derivation should build just fine but is marked as failed without anything appearing in the log). This needs a bit of investigation − Might be linked to https://github.com/NixOS/hydra/pull/875#discussion_r622156625 because maybe that bit of code also rewrites the output hashes of FO derivations.

thufschmitt avatar Apr 29 '21 08:04 thufschmitt

The test hydra instance just showed some weird failures with fixed-output derivations (https://hydra.ngi0.nixos.org/build/171 − the cbindgen vendor derivation should build just fine but is marked as failed without anything appearing in the log). This needs a bit of investigation − Might be linked to #875 (comment) because maybe that bit of code also rewrites the output hashes of FO derivations.

OK, looks like these are “legitimate” has mismatches. So the only issue is that the error isn’t properly reported − which I think is a Nix issue

thufschmitt avatar Apr 29 '21 10:04 thufschmitt

It would be really great if whatever the underlying issue is has an accompanying test wherever the fix belongs :) these sort of mystery problems are exhausting 😔

On April 29, 2021, GitHub @.***> wrote:

The test hydra instance just showed some weird failures with fixed- output derivations (https://hydra.ngi0.nixos.org/build/171 − the cbindgen vendor derivation should build just fine but is marked as failed without anything appearing in the log). This needs a bit of investigation − Might be linked to #875 (comment) https://github.com/NixOS/hydra/pull/875#discussion_r622156625 because maybe that bit of code also rewrites the output hashes of FO derivations.

OK, looks like these are “legitimate” has mismatches. So the only issue is that the error isn’t properly reported − which I think is a Nix issue

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NixOS/hydra/pull/875#issuecomment-829117758, or unsubscribe <https://github.com/notifications/unsubscribe- auth/AAASXLE7UTSXLSFPVILDEATTLEYFRANCNFSM4YCDMVTQ>.

grahamc avatar Apr 29 '21 10:04 grahamc

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/tweag-nix-dev-update-11/12886/1

nixos-discourse avatar May 04 '21 14:05 nixos-discourse

Possible issue with remote builders and hydra. (note: there should be no CA builds occurring) Using hydra:

 src = pkgs.fetchFromGitHub {
    owner = "regnat";
    repo = "hydra";
    rev = "2090d3a0899db4b63d3fbf93545be0020385a165";
    sha256 = "sha256-GrYR/xX/kWxxB0VoHKj0c3Pw7A6RLPexDkVNNVa7kn4=";
};

Getting errors like:

possibly transient failure building ‘/nix/store/XYZ.drv’ on ‘[email protected]’: error: dependency '/nix/store/3fs5wqlj9ipv7wjx4av81skmpcdjjh3d-jq-1.6-dev' of '/nix/store/XYZ.drv' does not exist, and substitution is disabled

Reverting to package = hydra-unstable; restores behavior as expected with regards to remote building. All builds have __contentAddressed = false. Any thoughts, leads? I attempted setting use-substitutes, useSubstitutes, and builders-use-substitutes to no effect. Only solution was to revert.

tomberek avatar May 11 '21 06:05 tomberek

Getting errors like:

possibly transient failure building ‘/nix/store/XYZ.drv’ on ‘[email protected]’: error: dependency '/nix/store/3fs5wqlj9ipv7wjx4av81skmpcdjjh3d-jq-1.6-dev' of '/nix/store/XYZ.drv' does not exist, and substitution is disabled

We’ve also started getting the same on https://hydra.ngi0.nixos.org/build/202 .

That might (quite ironically) be related to https://github.com/NixOS/hydra/pull/875/commits/d7a1b6b982bf64e560998ee6d041d0da66e518c8

thufschmitt avatar May 15 '21 06:05 thufschmitt

Running this branch now with ca-derivation builds, but getting errors in the UI like this: Caught exception in Hydra::Controller::Build->build "[31;1merror:[0m path '[33;1m/1vbp723njr8wq6m404hzkfmqck0vk73gcgr6pjc78il65mh6p9mn[0m' is not in the Nix store at /nix/store/62znwr88yc5mi5v0qxmzp9rd84km68q6-hydra-2021-05-03/libexec/hydra/lib/Hydra/Controller/Build.pm line 73.". Seems to be building anyway though, which wasn't the case with hydra-unstable.

Not sure if that's to be addressed in a future patch or if I hit some edge case.

Edit: I made a new jobset and now it seems solved (for that new jobset).

Mindavi avatar May 29 '21 12:05 Mindavi

@tomberek can you try 854dcec ? I think it should fix your issue.

@Mindavi there were some known issues with earlier commits in this branch where the database wasn’t properly filled with all the required informations, causing this kind of issue (and which persisted after an upgrade because the incomplete data was still there). Any chance that’s the case for you?

thufschmitt avatar May 31 '21 13:05 thufschmitt

I guess I broke it with trying to build with ca-derivations first and only then finding out hydra doesn't support it and applying thid patchset.

It seems to be ok now though, with a fresh jobset, apart from the (already expected) ui artifacts (notably, empty package names).

Mindavi avatar May 31 '21 16:05 Mindavi

This has now been running pretty smoothly for a while for me. However, there's one thing that's now starting to happen:

    Aborted: error: Trying to register a realisation of 'sha256:5a7f41bac1b8abb186eb4f0ec1a76ee90809fabbdd620bde7ea0ddb66cb96e6a!out', but we already have another one locally.
      Local: /nix/store/2489p9qk8i8wi627maxix9w15li2y890-systemd-247.6
      Remote: /nix/store/dvbv3d914ii22vnr277bslzq8lnwz68x-systemd-247.6

I already have systemd built locally, but it appears to be trying to substitute it anyways? And the remote appears to have another hash as well. I'm not sure what's happening here, but it shouldn't be trying to substitute it, I guess.

This has happened for some different packages over the last few days, e.g. also for glibc.

I have the binary cache for the ngi0 machine that's building the ca derivations enabled, I guess those might also have something to do with this?

Versions:

  • nix (Nix) 2.4pre20210707_02dd6bb
  • hydra: d20639a322e340770c9672e0181e70e48214a21c (latest commit on this PR as of today)

Mindavi avatar Aug 06 '21 07:08 Mindavi

The issue where realisations cannot be registered might be related to separateDebugInfo (or more specifically, the --build-id flag), since I can imagine that that can cause ca derivations to be mismatched, while otherwise the inputs appear the same. Not 100% sure how it's generated though, so this may be a red herring.

systemd does not have separateDebugInfo enabled, but uses the flag for itself to build this file: /lib/systemd/boot/efi/linuxx64.elf.stub.

Since nix tries to rebuild even if I try to do a build manually, I don't think this is a hydra-only problem.

It seems that this issue occurs mostly by building staging regularly. I'll have to find 2 versions of nixpkgs which will clearly reproduce this to help debug this issue at some point.

Mindavi avatar Sep 18 '21 05:09 Mindavi

Should we get this merged and working with 2.4?

Ericson2314 avatar Oct 09 '21 01:10 Ericson2314

I've run this for a while and even though it's not perfect, it does seem quite ok. Getting this in the main codebase may ensure that it doesn't get too stale, since it now needs a rebase with lots of subtle changes (since I'm not familiar with the codebase it was too hard to get right for me). It was not stable enough for me to keep running, since it kept getting stuck due to the issue described above, where a realisation couldn't be registered. The only way I found to solve that was to do a garbage collection, which is not great.

I think getting this in may also lead to more people testing it out and ironing out the last issues (no idea how many people run hydra though). I wouldn't say that it's wise to run the whole of nixpkgs on this yet, but it is good enough for a first cut IMO.

More testers would be nice, but it doesn't seem like that's going to happen on the short term, and this will get more stale over time.

Depends on how much time the author has, of course :).

Mindavi avatar Oct 10 '21 15:10 Mindavi

Thanks for the reminder @Ericson2314 @Mindavi .

I’ve merged it with master and cleaned a couple things to make the changes (sligtly) easier to grasp (partially answering https://github.com/NixOS/hydra/pull/875#issuecomment-828411639).

@grahamc @edolstra it would be awesome if you could find some bit of time to give this a look :)

thufschmitt avatar Oct 20 '21 08:10 thufschmitt

Pinging again :) Really hope we can see this merged, and then discuss QAing Nixpkgs!

Ericson2314 avatar Dec 01 '21 18:12 Ericson2314

Needed this with nix-2.6.0pre20220117_fc2443a.

t184256 avatar Jan 19 '22 08:01 t184256

@edolstra @thufschmitt @grahamc we should do...something about this!

Ericson2314 avatar Mar 01 '22 02:03 Ericson2314

Hydra now uses Nix 2.6 so this can shrink a little bit on rebase/merge

Ericson2314 avatar Mar 01 '22 02:03 Ericson2314

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/tweag-nix-dev-update-27/18433/1

nixos-discourse avatar Apr 01 '22 11:04 nixos-discourse

@thufschmitt I 'm really interested into getting this upstream and I'm available to work on it. For now I tried downloading your branch (nix-ca, what is is nix-ca2 instead?) and updgrading to nix 2.8.1 and it worked. My idea was to rebasing it into the current master (or the opposite is bette?). Is there anything I need to know? What do you suggest?

aciceri avatar Jun 08 '22 08:06 aciceri

@aciceri that's very good news. I'll try to schedule a call with you to discuss it if that's OK so that I can give you an overview of the current state (as much as I remember, it's been a while 😬)

thufschmitt avatar Jun 10 '22 12:06 thufschmitt

I am planning to setup a hydra with this branch in the next weeks. I don't really have good knowledge in the languages used nor do I have to much spare time for yet another project but I just wanted to let you know that all your works are highly appreciated and people are looking with great anticipation on ca.

SuperSandro2000 avatar Jun 10 '22 13:06 SuperSandro2000

Closing in favor of #1228

thufschmitt avatar Jun 30 '22 09:06 thufschmitt

Reopening because I can push to this one.

Ericson2314 avatar Dec 04 '23 18:12 Ericson2314