hydra icon indicating copy to clipboard operation
hydra copied to clipboard

Deduplicate code between Hydra and Nix

Open Ericson2314 opened this issue 3 years ago • 8 comments

Currently Hydra duplicates lots of functionality that is in Nix. This is bad because it leads to either the Nix or Hydra versions of a feature being undertested and broken (e.g. remote builds are very finicky), or missing features on one side (Hydra doesn't support ssh-ng://).

The solution is to dedup, fixing both problems, and lowing our maintenance burden across the board.

I will try to break this down into steps, updating this list accordingly

Easier

  • [x] Generic SSH logic

    • Nix: src/libstore/ssh.cc in Nix
    • Hydra: top of src/hydra-queue-runner/build-remote.cc

    Notable differences: Hydra passes a number of extra flags to ssh, evidently based on sysadmin war stories.

  • [x] Legacy SSH protocol

    • Nix: src/libstore/legacy-ssh-store.cc in Nix
    • Hydra: Much of src/hydra-queue-runner/build-remote.cc

    Notable differences: Hydra doesn't issue builds blocking and synchronously, so it can cancel them and handle logs as they occur. Probably other things, TBD.

    • [x] Split up logic so subsequent refactors are easier to both write and review https://github.com/NixOS/hydra/pull/1180

    • [x] ServeProto::Command::QueryValidPath

      • Nix: https://github.com/NixOS/nix/pull/6134
      • Hydra: https://github.com/NixOS/hydra/pull/1165
    • [x] ServeProto::Command::ImportPaths

    • [x] ServeProto::Command::BuildDerivation

      • Hydra part-way: https://github.com/NixOS/hydra/pull/1318
    • [x] ServeProto::Command::QueryPathInfos

      • Nix: https://github.com/NixOS/nix/pull/9560
    • [x] ServeProto::Command::DumpStorePath

  • [ ] Use Store abstraction after previous two are complete. https://github.com/NixOS/hydra/pull/1200

  • [x] Use the same notion of a Machine config for remote builders. https://github.com/nixos/nix/commit/ebc9f36a8111ddecc8e265e8a6a70048218f244d was a Nix-side refactor to prepare for this, but then the Hydra side was not done.

    • [x] #1341 Use Nix's Machine type
    • [x] #1386 Use Nix's Machine parser

    See also #484

    See also https://github.com/NixOS/nix/issues/5638, where we would like more of the machine stuff to go in the Store config. That ties this to the previous checkbox.

Farther off

  • [ ] Duplicate Scheduling logic, Nix's build/*.cc vs Hydra

CC @rickynils @Ma27 @grahamc

Ericson2314 avatar Feb 20 '22 17:02 Ericson2314

Note to fix this, it would be very convenient to use a local build of Nix during Hydra development. I will try to hack that up.

edit it is indeed possible!

Ericson2314 avatar Feb 20 '22 17:02 Ericson2314

https://discourse.nixos.org/t/developing-a-system-that-replaces-nix-remote-build/22388/11 lists a huge number of issues with the current remote building situation that makes using Nix in large institutional contexts difficult. CC @elaforge

The immediate goals of this issue are to shrink the amount of the code an increase testing coverage by moving us towards a world where hydra and remote building both battle-test the same code paths. But a further goal would be to, be improving the quality and flexibility of the build scheduling code, make experimentation with the alternative architectures suggested in that that thread easier.

Ericson2314 avatar Oct 31 '22 12:10 Ericson2314

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/developing-a-system-that-replaces-nix-remote-build/22388/13

nixos-discourse avatar Oct 31 '22 12:10 nixos-discourse

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2022-12-02-nix-team-meeting-minutes-13/23731/1

nixos-discourse avatar Dec 05 '22 09:12 nixos-discourse

Related: https://github.com/NixOS/hydra/pull/1200

  • @thufschmitt: agreed with the idea, the maintenance overhead of duplicated code is significant
    • @edolstra: the big duplication is that Hydra has its own build loop, which
      • @thufschmitt: why does that exist in the first place?
      • @edolstra: for optimisation. later it was not obvious how to port it back to Nix
  • @edolstra: related: https://github.com/NixOS/hydra/pull/1200
  • @thufschmitt: key question: is anyone willing to spend significant time on this? sounds like months of work
    • @fricklerhandwerk: this would be a worthwhile to have funding for, because it unblocks lots of further development, such as RFC92 (dynamic derivations) and stabilising RFC62 (content-addressed derivations), separate store types
      • @Ericson2314: we also need resources for reviews, otherwise the work will never land
        • right now it ultimately revolves around trust. we could leverage different processes, such as delegating more involved reviews to pairs of team members

fricklerhandwerk avatar Feb 10 '23 12:02 fricklerhandwerk

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-02-10-nix-team-meeting-minutes-31/25438/1

nixos-discourse avatar Feb 13 '23 15:02 nixos-discourse

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-03-27-nix-team-meeting-minutes-44/26759/1

nixos-discourse avatar Mar 27 '23 13:03 nixos-discourse

#1340 #1339 got us a big closer on this too.

Ericson2314 avatar Jan 23 '24 15:01 Ericson2314