nix icon indicating copy to clipboard operation
nix copied to clipboard

fetchTree attempts download despite narHash existing in store

Open Warbo opened this issue 9 months ago • 10 comments

Describe the bug

The output of fetchTree { type = "file"; narHash = "..."; url = "..."; } seems to depend only on the narHash, since fetching the same file from different URLs gives the same outPath.

However, unlike a fixed-output derivation, fetchTree will try to perform the download (or at least attempts to connect to the URL) even if the outPath already exists. I think this is due to checking a URL-based cache, but not checking whether the store path already exists.

Steps To Reproduce

Use fetchTree with a narHash to fetch a file from a URL, and note its outPath. Then try it with the same narHash and a different URL. It will attempt to connect/download, even though we already have that outPath.

Here's a concrete example, fetching the same file from multiple IPFS gateways (I got this IPFS CID using printf 'hello world' | ipfs block add):

Fetch from ipfs.io with an empty narHash, to find what the narHash should be (I'm on NixOS, but using Nix 2.27 for some unrelated git-hashing fixes):

$ nix repl
Nix 2.27.0pre19700101_dirty
Type :? for help.
nix-repl> builtins.fetchTree { type = "file"; url = "https://ipfs.io/ipfs/bafkreifzjut3te2nhyekklss27nh3k72ysco7y32koao5eei66wof36n5e"; narHash = ""; }

error:
       … while calling the 'fetchTree' builtin
         at «string»:1:1:
            1| builtins.fetchTree { type = "file"; url = "https://ipfs.io/ipfs/bafkreifzjut3te2nhyekklss27nh3k72ysco7y32koao5eei66wof36n5e"; narHash = ""; }
             | ^

       … while fetching the input 'https://ipfs.io/ipfs/bafkreifzjut3te2nhyekklss27nh3k72ysco7y32koao5eei66wof36n5e?narHash=sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA%3D'

       error: NAR hash mismatch in input 'https://ipfs.io/ipfs/bafkreifzjut3te2nhyekklss27nh3k72ysco7y32koao5eei66wof36n5e?narHash=sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA%3D', expected 'sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=' but got 'sha256-rkUEKu9bFIg12wLQRf6JtMCf+eR22rABoUvAMi0/IJM='

Fetching with that narHash (the file seems to have been cached):

nix-repl> builtins.fetchTree { type = "file"; url = "https://ipfs.io/ipfs/bafkreifzjut3te2nhyekklss27nh3k72ysco7y32koao5eei66wof36n5e"; narHash = "sha256-rkUEKu9bFIg12wLQRf6JtMCf+eR22rABoUvAMi0/IJM="; }
{
  narHash = "sha256-rkUEKu9bFIg12wLQRf6JtMCf+eR22rABoUvAMi0/IJM=";
  outPath = "/nix/store/0csgnsbvjfr2axpryskr9v7l43bzjvnd-source";
}

Now we know the narHash, try fetching the same file from a different URL:

nix-repl> builtins.fetchTree { type = "file"; url = "https://cloudflare-ipfs.com/ipfs/bafkreifzjut3te2nhyekklss27nh3k72ysco7y32koao5eei66wof36n5e"; narHash = "sha256-rkUEKu9bFIg12wLQRf6JtMCf+eR22rABoUvAMi0/IJM="; }

warning: error: unable to download 'https://cloudflare-ipfs.com/ipfs/bafkreifzjut3te2nhyekklss27nh3k72ysco7y32koao5eei66wof36n5e': Could not resolve hostname (6) Could not resolve host: cloudflare-ipfs.com; retrying in 303 ms
warning: error: unable to download 'https://cloudflare-ipfs.com/ipfs/bafkreifzjut3te2nhyekklss27nh3k72ysco7y32koao5eei66wof36n5e': Could not resolve hostname (6) Could not resolve host: cloudflare-ipfs.com; retrying in 645 ms
warning: error: unable to download 'https://cloudflare-ipfs.com/ipfs/bafkreifzjut3te2nhyekklss27nh3k72ysco7y32koao5eei66wof36n5e': Could not resolve hostname (6) Could not resolve host: cloudflare-ipfs.com; retrying in 1049 ms
warning: error: unable to download 'https://cloudflare-ipfs.com/ipfs/bafkreifzjut3te2nhyekklss27nh3k72ysco7y32koao5eei66wof36n5e': Could not resolve hostname (6) Could not resolve host: cloudflare-ipfs.com; retrying in 2426 ms
error:
       … while calling the 'fetchTree' builtin
         at «string»:1:1:
            1| builtins.fetchTree { type = "file"; url = "https://cloudflare-ipfs.com/ipfs/bafkreifzjut3te2nhyekklss27nh3k72ysco7y32koao5eei66wof36n5e"; narHash = "sha256-rkUEKu9bFIg12wLQRf6JtMCf+eR22rABoUvAMi0/IJM="; }
             | ^

       … while fetching the input 'https://cloudflare-ipfs.com/ipfs/bafkreifzjut3te2nhyekklss27nh3k72ysco7y32koao5eei66wof36n5e?narHash=sha256-rkUEKu9bFIg12wLQRf6JtMCf%2BeR22rABoUvAMi0/IJM%3D'

       error: unable to download 'https://cloudflare-ipfs.com/ipfs/bafkreifzjut3te2nhyekklss27nh3k72ysco7y32koao5eei66wof36n5e': Could not resolve hostname (6) Could not resolve host: cloudflare-ipfs.com
[0.0 MiB DL]

Cloudflare have shut down that IPFS gateway, but we still attempted to connect to it despite already having that file in our store.

If we try another, working URL then it will re-download the file, but the output is identical to using the original ipfs.io URL:

nix-repl> builtins.fetchTree { type = "file"; url = "https://gateway.pinata.cloud/ipfs/bafkreifzjut3te2nhyekklss27nh3k72ysco7y32koao5eei66wof36n5e"; narHash = "sha256-rkUEKu9bFIg12wLQRf6JtMCf+eR22rABoUvAMi0/IJM="; }
{
  narHash = "sha256-rkUEKu9bFIg12wLQRf6JtMCf+eR22rABoUvAMi0/IJM=";
  outPath = "/nix/store/0csgnsbvjfr2axpryskr9v7l43bzjvnd-source";
}

Expected behavior

If the outPath already exists in our store, then those fetchTree calls should return the { narHash = "..."; outPath = "..."; } result immediately, without attempting to download the URL.

Metadata

nix-env (Nix) 2.27.0pre19700101_dirty

Additional context

Related issues:

  • https://github.com/NixOS/nix/issues/9570 seems to also be caused by fetchTree running "eagerly", in a way that fixed-output derivations wouldn't.
  • https://github.com/NixOS/nix/issues/9077 would make fetchTree act more like a fixed-output derivation. It doesn't mention the outPath being independent of the input URL, or fetchTree being too "eager" to perform a download when it doesn't need to.

I'm currently working around this in a rather clunky way, by using a fixed-output derivation that uses /bin/sh to make a copy of the fetched file. This way, I can query whether the outPath already exists without having to call fetchTree (I use /dev/null instead):

with rec {
  inherit (builtins) currentSystem derivation fetchTree getEnv pathExists;

  override = getEnv "IPFS_GATEWAY";
  gateway = if override == "" then "https://ipfs.io" else override;

  fixed = src: derivation {
    name = "source";
    builder = "/bin/sh";
    system = currentSystem;
    outputHashMode = "nar";
    outputHash = narHash;
    args = [
      "-c"
      ''read -r -d "" content < ${src}; printf '%s\n' "$content" > "$out"''
    ];
  };
  existing = (fixed "/dev/null").outPath;
  file = if pathExists existing then existing else fixed (fetchTree {
    inherit narHash;
    type = "file";
    url = "${gateway}/ipfs/${cid}";
  });
};
file

Checklist


Add :+1: to issues you find important.

Warbo avatar Mar 25 '25 12:03 Warbo

I'm seeing the same thing, with an input of type github. I ran a simple repro like this:

docker run -it --rm nixos/nix

Then this command:

nix eval \
  --extra-experimental-features nix-command \
  --extra-experimental-features flakes \
  --expr 'builtins.fetchTree "github:NixOS/nixpkgs/fdfc4347e915779fe00aca31012e23941b6cd610?narHash=sha256-pCglMme56MWxtTNRWrLj55/eJXw4dX4HmZYXUm6%2BDO4%3D"'

It downloads a whole Nixpkgs checkout. I verified that the reported outPath ended up in the store. Then I run rm -r ~/.cache and run the command again, and it downloads again.

FWIW this was already reported and marked as fixed once before: https://github.com/NixOS/nix/issues/10104.

thomasjm avatar Apr 01 '25 07:04 thomasjm

I poked around in the code and found that the code path which reuses the path in the store requires that isFinal() return true:

https://github.com/NixOS/nix/blob/9ed5482545b609b095d3597ef31eaa64c9ad5ed8/src/libfetchers/fetchers.cc#L313-L320

and isFinal checks some secret attr called __final:

https://github.com/NixOS/nix/blob/9ed5482545b609b095d3597ef31eaa64c9ad5ed8/src/libfetchers/fetchers.cc#L158-L161

My first thought was to try passing __final = true as part of my attrs to builtins.fetchTree, but it seems that's not allowed:

https://github.com/NixOS/nix/blob/9ed5482545b609b095d3597ef31eaa64c9ad5ed8/src/libexpr/primops/fetchTree.cc#L203-L204

thomasjm avatar Apr 01 '25 21:04 thomasjm

See #10612 for the motivation why this is the case (in short: the narHash doesn't guarantee that we have all the other attributes that the fetcher might return, like lastModified and revCount).

edolstra avatar Jun 04 '25 19:06 edolstra

A narHash alone is not quite enough to avoid a download. There are more attributes needed that are being provided by ~/.cache and thus forces a reload when those are not there. In other words, that flakeref might be locked in some sense, but can't be used to create a proper lockfile due to the missing attributes without re-fetching or checking the fetcher cache.

Needs some design work.

Needs some clarification if adding those other attributes would fix the problem (lastModified? rev/ref for git attrs., etc?) and if documenting that would be enough.

tomberek avatar Jun 04 '25 19:06 tomberek

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2025-05-04-nix-team-meeting-minutes-230/65206/1

nixos-discourse avatar Jun 04 '25 22:06 nixos-discourse

Hey, thanks for getting back to me.

I'll just add one user's perspective: it seems like fetchTree is doing double-duty here: on one hand it's part of the machinery underlying flakes, and on the other hand it's being positioned as a nice unified replacement for all the old builtin fetchers (fetchGit, fetchTarball, etc.). These attributes like lastModified or revCount seem like flake concerns, not fetcher concerns -- so I would hope not to see flake concerns start to affect the fetchers.

thomasjm avatar Jun 05 '25 00:06 thomasjm

Some of the attributes like ref/rev are to allow reliable fetching. The local short-circuit with just narHash is helpful, but doesnt help fetch the correct thing when others are using that url.

tomberek avatar Jun 05 '25 01:06 tomberek

Oh yes, I totally expect to use ref/rev like the traditional fetchers do. My example above showed using a URL like builtins.fetchTree "github:NixOS/nixpkgs/<rev>?narHash=<hash>". It's when it comes to the more extraneous-sounding attributes that I become concerned. For one thing, they'd make this URL more cumbersome.

thomasjm avatar Jun 05 '25 01:06 thomasjm

Is there a sensible workaround I can use right now -- is it the case that applying more URL parameters can make it final? The unwanted fetching is a problem.

peterwaller-arm avatar Jun 05 '25 08:06 peterwaller-arm

I would hope not to see flake concerns start to affect the fetchers.

+1 I avoid all the flake stuff, but it's always nice to improve the builtin fetchers (compared to the bad old days of (import <nixpkgs> {}).fetchFromGitHub { owner = "nixos"; name = "nixpkgs"; ... }!)

Some of the attributes like ref/rev are to allow reliable fetching. The local short-circuit with just narHash is helpful, but doesnt help fetch the correct thing when others are using that url.

Could such attributes just be "passed through" from the arguments into the result, if we evaluate builtins.fetchTree { ....; narHash = "..."; ref = "foo"; rev = "bar"; } and that narHash is already in the store, then we get { outPath = "..."; narHash = "..."; ref = "foo"; rev = "bar"; }, with those ref and rev values just copied from those arguments? Or allow passing in __final as an argument, to say we don't care about those things?

It also sounds like substituting has been (temporarily?) disabled for these fetchers, which isn't good for reliability (I've been burned by depending on HTTP URLs multiple times!)

Warbo avatar Jun 05 '25 23:06 Warbo

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nix-copying-a-store-path-into-the-store/60409/16

nixos-discourse avatar Aug 03 '25 22:08 nixos-discourse