nix icon indicating copy to clipboard operation
nix copied to clipboard

Don't download source tarball if not needed.

Open yshui opened this issue 1 year ago • 11 comments

Motivation

flake tarball inputs generate two store paths, it first downloads the source file into the store, then unpacks it into another store path.

assuming the file at the source URL hasn't changed, when I run nix flake update, if both paths exist, all is good. nix will fetch the source URL with an etag, and get a 304 response, and do nothing. but if the source file has been deleted from the store, then nix will redownload the source file, regardless if the unpacked store path is still valid.

this commit makes sure if the unpacked source exists in the store, the source file will only be downloaded if it has changed.

Context

  • I changed cache->lookupExpired(store, attrs) so it returns the cache entry along with if the store path is still valid, instead of just throwing it away when the path is invalid.
  • I added a parameter onlyIfChanged to downloadFile. when true, downloadFile will only download the source file if it's different from the cache, even when the source file doesn't exist in store.
  • Finally I changed downloadTarball to utilize this parameter to achieve what I described in the motivation section

Priorities and Process

Add :+1: to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

yshui avatar Jan 20 '24 16:01 yshui

The other option is to do what prefetch is already doing - don't call downloadFile and don't store the source file in the store. This also saves some space if you don't aggressively run nix store gc.

I am fine with either option.

yshui avatar Jan 20 '24 22:01 yshui

BTW other input types have similar problems as well. For example for github sources, the request to api.github.com is made without etag if the json file is not in store, even when the server would have returned 304. But the json file is pretty small so this is less of a problem.

yshui avatar Jan 20 '24 22:01 yshui

Is this trying to fix #9814?

Ericson2314 avatar Jan 22 '24 15:01 Ericson2314

Is this trying to fix #9814?

No, they are distantly related, but this PR won't fix that issue.

yshui avatar Jan 22 '24 16:01 yshui

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2024-01-22-nix-team-meeting-minutes-117/38838/1

nixos-discourse avatar Jan 26 '24 09:01 nixos-discourse

Hi, so what's the decision on this PR? what do I need to do to get this merged?

yshui avatar Feb 11 '24 09:02 yshui

@yshui the other PR I was talking about #10038, is now merged. Now would be a good time to check what, if anything, is still needed.

Ericson2314 avatar Feb 21 '24 21:02 Ericson2314

@Ericson2314 what does #10038 do?

yshui avatar Feb 22 '24 00:02 yshui

@yshui It caches downloaded tarballs (not individual files) in a separate git repo (for file-granularity dedup) instead of the store. I suppose it is still possible for data to be in the cache DB but not the git repo, so the principle you are trying to ensure works still applies, but the implementation details are quite different.

Ericson2314 avatar Feb 22 '24 00:02 Ericson2314

@Ericson2314 hmm, interesting. so if i do a nix store gc, would it also delete the tarball from the git repository? and it would be redownloaded next time i ask for unpacked tarball even if it's not changed?

yshui avatar Feb 22 '24 13:02 yshui

@yshui nix store gc would not affect the tarball cache. I don't know how it is garbage collected.

Ericson2314 avatar Feb 22 '24 14:02 Ericson2314

@Ericson2314 I looked into it, and I think implementing this became a bit easier after the new git cache, but it looks like it also introduced some bugs (#10080), I'll update this PR after things are settled a bit more.

yshui avatar Feb 27 '24 16:02 yshui

Hold on, why do we need to cache the tarball at all? What we want is the unpacked tarball, why are we keeping the tarball file in cache?

(or did I misunderstand? is the unpacked tarball being cached?)

yshui avatar Feb 27 '24 22:02 yshui