cachix icon indicating copy to clipboard operation
cachix copied to clipboard

Sometimes `cachix watch-exec` fails with `cachix: <<loop>>`

Open exarkun opened this issue 1 year ago • 13 comments

Describe the bug

Sometimes a CircleCI step fails with a final line of output:

cachix: <<loop>>

The cachix watch-exec ... command exits with non-zero status failing the CI job. The final built object does appear to arrive at the cache (although it could also be that some later interaction added it, not the failing interaction - I'm not sure).

The CircleCI configuration is like (with simplifications and the real cache name elided):

jobs:
  build:
    docker:
      - image: "nixos/nix:2.22.1"

    environment:
      NIX_CONFIG: |
        experimental-features = nix-command flakes
        accept-flake-config = true

      CACHIX_NAME: "..."

    steps:
      - checkout
      - run:
          name: "Set up Caching"
          command: |
            NIXPKGS="nixpkgs/nixos-24.05"
            nix profile install --impure $NIXPKGS#cachix
            cachix --version
            cachix use "${CACHIX_NAME}"
      - run:
          name: "Build & Test"
          command: |
            # Run the build with cachix watching so that any outputs are
            # automatically sent to the cache.
            cachix watch-exec "${CACHIX_NAME}" -- nix build .#pkg-junit-xml --print-build-logs

      - run:
          name: "Docs"
          command: |
            cachix watch-exec "${CACHIX_NAME}" -- nix build .#pkg.doc --print-build-logs

The flake inputs are

Inputs:
├───agenix: github:ryantm/agenix/8d37c5bdeade12b6479c85acd133063ab53187a0?narHash=sha256-2T7CHTqBXJJ3ZC6R/4TXTcKoXWHcvubKNj9SfomURnw%3D (2024-05-09 21:32:35)
│   ├───darwin: github:lnl7/nix-darwin/4b9b83d5a92e8c1fbfd8eb27eda375908c11ec4d?narHash=sha256-gzGLZSiOhf155FW7262kdHo2YDeugp3VuIFb4/GGng0%3D (2023-11-24 03:11:34)
│   │   └───nixpkgs follows input 'agenix/nixpkgs'
│   ├───home-manager: github:nix-community/home-manager/3bfaacf46133c037bb356193bd2f1765d9dc82c1?narHash=sha256-7ulcXOk63TIT2lVDSExj7XzFx09LpdSAPtvgtM7yQPE%3D (2023-12-20 23:00:17)
│   │   └───nixpkgs follows input 'agenix/nixpkgs'
│   ├───nixpkgs: github:NixOS/nixpkgs/54aac082a4d9bb5bbc5c4e899603abfb76a3f6d6?narHash=sha256-%2BtFNwMvlXLbJZXiMHqYq77z/RfmpfpiI3yjL6o/Zo9M%3D (2023-12-19 19:15:32)
│   └───systems: github:nix-systems/default/da67096a3b9bf56a91d16901293e51ba5b49a27e?narHash=sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768%3D (2023-04-09 08:27:08)
├───flake-parts: github:hercules-ci/flake-parts/e5d10a24b66c3ea8f150e47dfdb0416ab7c3390e?narHash=sha256-yzcRNDoyVP7%2BSCNX0wmuDju1NUCt8Dz9%2BlyUXEI0dbI%3D (2024-05-02 09:10:30)
│   └───nixpkgs-lib: https://github.com/NixOS/nixpkgs/archive/50eb7ecf4cd0a5756d7275c8ba36790e5bd53e33.tar.gz?narHash=sha256-QBx10%2Bk6JWz6u7VsohfSw8g8hjdBZEf8CFzXH1/1Z94%3D (2024-05-02 09:00:52)
├───nixpkgs: github:NixOS/nixpkgs/2057814051972fa1453ddfb0d98badbea9b83c06?narHash=sha256-5ZSVkFadZbFP1THataCaSf0JH2cAH3S29hU9rrxTEqk%3D (2024-05-12 17:21:43)
└───pre-commit-hooks: github:cachix/pre-commit-hooks.nix/c182c876690380f8d3b9557c4609472ebfa1b141?narHash=sha256-/5u29K0c%2B4jyQ8x7dUIEUWlz2BoTSZWUP2quPwFCE7M%3D (2024-05-13 14:15:11)
    ├───flake-compat: github:edolstra/flake-compat/0f9255e01c2351cc7d116c072cb317785dd33b33?narHash=sha256-kvjfFW7WAETZlt09AgDn1MrtKzP7t90Vf7vypd3OL1U%3D (2023-10-04 13:37:54)
    ├───flake-utils: github:numtide/flake-utils/b1d9ab70662946ef0850d488da1c9019f3a9752a?narHash=sha256-SZ5L6eA7HJ/nmkzGG7/ISclqe6oZdOZTNoesiInkXPQ%3D (2024-03-11 08:33:50)
    │   └───systems: github:nix-systems/default/da67096a3b9bf56a91d16901293e51ba5b49a27e?narHash=sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768%3D (2023-04-09 08:27:08)
    ├───gitignore: github:hercules-ci/gitignore.nix/637db329424fd7e46cf4185293b9cc8c88c95394?narHash=sha256-HG2cCnktfHsKV0s4XW83gU3F57gaTljL9KNSuG6bnQs%3D (2024-02-28 02:28:52)
    │   └───nixpkgs follows input 'pre-commit-hooks/nixpkgs'
    ├───nixpkgs: github:NixOS/nixpkgs/e367f7a1fb93137af22a3908f00b9a35e2d286a7?narHash=sha256-p7ryWEeQfMwTB6E0wIUd5V2cFTgq%2BDRRBz2hYGnJZyA%3D (2024-03-18 12:38:16)
    └───nixpkgs-stable: github:NixOS/nixpkgs/614b4613980a522ba49f0d194531beddbb7220d3?narHash=sha256-3Eh7fhEID17pv9ZxrPwCLfqXnYP006RKzSs0JptsN84%3D (2024-03-17 17:16:56)

Version

Store path - /nix/store/amwpzhr2hs4ydwgrl90gqkkg46dm0idg-cachix-1.7.3-bin cachix --version - 1.7.3

exarkun avatar Jun 04 '24 16:06 exarkun

The "sometimes" is for the same revision. Several times I've triggered a rebuild of a revision for which the build failed this way and it succeeds the next time.

exarkun avatar Jun 04 '24 16:06 exarkun

Oh, wow, I guess there's a typo somewhere causing a loop. This is going to be quite challenging to debug.

How reproducible is this? Would you be able to run cachix with profiling enabled? It'll print out the location where the error is coming from.

nix profile install github:cachix/cachix/enable-profiling --accept-flake-config
cachix +RTS -xc -RTS watch-exec "${CACHIX_NAME}" -- nix build .#pkg-junit-xml --print-build-logs

sandydoo avatar Jun 04 '24 17:06 sandydoo

I've seen it 3 or 4 times now so ... somewhat reproducible. But I haven't been able to trigger it on demand yet. I can try a bunch of builds w/ the above configuration though and see how it goes.

exarkun avatar Jun 04 '24 18:06 exarkun

I'm seeing this too when running builds on concourse CI.

risicle avatar Jul 06 '24 12:07 risicle

This one has continued to elude reproduction for me. I have the impression that it depends on the state of the cache because seems never to arise twice for a build of the same version of the code (a slightly stronger statement than "never arise twice for the same git revision" because even if I take a git revision for which it has happened and add another commit which switches to the profiling build of cachix and let CI re-run, it doesn't happen again).

exarkun avatar Jul 06 '24 13:07 exarkun

Also finding this hard to reproduce.

risicle avatar Jul 06 '24 15:07 risicle

It's interesting that in another report shows no paths were printed out:

Pushing to cachix
Pushing 41 paths (35079 are already present) using zstd to cache xxx ⏳


All done.
cachix: <<loop>>

Exited with code exit status 1

I'd ask for logs next time it happens.

domenkozar avatar Sep 18 '24 11:09 domenkozar

The most likely cause seems to be using ascii-progress with all the concurrency stuff it's doing in the background using concurrent-output library.

To simplify this we should:

  • merge code between cachix/src/Cachix/Daemon/Progress.hs and cachix/src/Cachix/Client/Command/Push.hs
  • use Progress.hs as the default implementation because it supports retries
  • expose the same progress bar API using plain ansi-terminal to remember the cursor state and render the new output on tick

Marking this as low-priority because it does push the paths and only affects printing.

domenkozar avatar Sep 18 '24 11:09 domenkozar

I'm pretty much ready to conclude that this doesn't reproduce with profling / -xc. I've been running cachix with -xc for the last 9 months on all of my CI jobs and this hasn't happened once since I turned it on. I'll probably turn it off again soon to get rid of all the noise. I hope the <<loop>> doesn't come back...

exarkun avatar Mar 20 '25 10:03 exarkun

Just want to add a data point that I am able to see this reproduce very frequently. I am using cachix --hostname=<my-host> watch-exec <my-cache> -- nix develop .#<my-devshell>

GeneralOneill avatar Jun 11 '25 21:06 GeneralOneill

@GeneralOneill what cachix version are you using?

domenkozar avatar Jun 12 '25 09:06 domenkozar

@GeneralOneill what cachix version are you using?

Currently my CI runner is using cachix 1.7.3.

I've tried to use +RTS -xc -RTS flags, but they didn't work because our cachix version is not built with profiling turned on.

GeneralOneill avatar Jun 12 '25 17:06 GeneralOneill

I have this on my radar.

I saw this once (and for the first time) this week in an older version of our CLI running our backend integration suite. I've configured all of our test runs to run with profiling and -xc enabled, so if it happens again, we'll have an idea where it's coming from. I really wish Haskell just had stack traces by default...

sandydoo avatar Jun 12 '25 19:06 sandydoo