stackage2nix icon indicating copy to clipboard operation
stackage2nix copied to clipboard

Build stackage2nix in NixOS sandbox

Open 4e6 opened this issue 7 years ago • 15 comments

Unable to build nix/stackage2nix on NixOS with nix.useSandbox enabled.

nix.useSandbox If set, Nix will perform builds in a sandboxed environment that it will set up automatically for each build. This prevents impurities in builds by disallowing access to dependencies outside of the Nix store. This isn't enabled by default for performance. It doesn't affect derivation hashes, so changing this option will not trigger a rebuild of packages.

4e6 avatar Dec 07 '17 12:12 4e6

related #40

4e6 avatar Dec 07 '17 12:12 4e6

Currently, I see no ways of sandboxing the stackage2nix wrapper. See the issues below.

stackage2nix wrapper requires following dependencies to be fetched, see nix/lib.nix

To be able to satisfy the sandbox requirements, all these dependencies should be prefetched before the build by the standard nix-prefetch-scripts.

Stackage config files

Only files are needed, so fetchgit can be used to fetch fpco/lts-haskell and fpco/stackage-nightly dependencies. The only issue with this approach is less convenient updates, because it would require updating revision and hash for both repos, instead of bumping single cacheVersion parameter.

all-cabal-hashes

To build the exact copy of stackage packages set, stackage2nix searches for a project definitions in all-cabal-hashes by a hash defined in stackage config (single version of the package may have different revisions). In order to do so, all-cabal-hashes should be fetched with git metadata. Due to NixOs/nixpkgs #8567 there is no reliable way to do this with fetchgit. The solution might be to fetch zip archive of a particular version. AFAIK, Github is able to create such links but only for archives containing project files, without metadata.

hackage-db

An issue with hackage-db is that URL doesn't have a particular version to put in fetchurl script. I'm assuming that hackage-db could be recreated from all-cabal-hashes repo, but I'm not sure how. Other solution would be to fetch versioned db from some other place.

4e6 avatar Dec 12 '17 21:12 4e6

all-cabal-hashes stuff in callHackage: https://github.com/NixOS/nixpkgs/blob/43a62b66d0175b10fd3cc6f1fabdec9d205c171c/pkgs/development/haskell-modules/make-package-set.nix#L126

kirelagin avatar Jun 09 '18 10:06 kirelagin

Regarding the non-determinism of all-cabal-hashes.

I've found this old comment on the original issue thread. The idea is to unpack the git objects and store them uncompressed https://github.com/bendlas/nixpkgs/commit/4b9c24a5d33407f88457d7e125ca78cbefa30afa We should be able to do this unpacking as a postUnpack build step.

Downsides:

  • will lead to increased size of git repository

Upsides:

  • deterministic fetchgit
  • (should be checked) we can access those objects through the libgit interface (no changes are needed for stackage2nix itself)

4e6 avatar Jun 09 '18 13:06 4e6

Do you really care about the git history or is it because the tool wants to query the current reference of the checkout?

For the latter, it could make sense to re-build a fake .git database with only the following files:

.git/HEAD -> ref: refs/heads/master
.git/refs/heads/master -> e843a2271a972b8cb6401e67f25d22c8f6fa68cb

zimbatm avatar Jun 09 '18 15:06 zimbatm

@zimbatm It's the mapping from sha1 to a file content that is needed.

binarin avatar Jun 09 '18 15:06 binarin

so the tool is not looking at the checked-out content but querying the git database directly instead?

if you go down the fetchgit + unpacked blobs maybe you can make it smaller by using a shallow copy of the database.

given the level of effort involved it could make sense to patch upstream as well

zimbatm avatar Jun 09 '18 15:06 zimbatm

@zimbatm The full history is still needed, as we need all blobs reachable from the required commit.

I've discussed this with @4e6, and I think I'll just make a small tool that will create a canonical representation of git .pack file. So if everything (branches, tags) is properly pruned before that, the result will be a working git checkout that is also reproducible. I'll experiment with this approach here. If it'll work out, I try to do the same in the fetchgit itself.

binarin avatar Jun 09 '18 15:06 binarin

I tried the approach referenced in my previous comment with the unpacking of git objects https://github.com/bendlas/nixpkgs/commit/4b9c24a5d33407f88457d7e125ca78cbefa30afa

This led to the increase of all-cabal-hashes checkout size from 1.6 Gb to 16 Gb, which is not acceptable.

4e6 avatar Jun 10 '18 10:06 4e6

Maybe we can use the github zip archive? It should allow fast random reads.

yorickvP avatar Jan 14 '19 14:01 yorickvP

Maybe we can use the github zip archive? It should allow fast random reads.

Filenames are used only as a fallback, primary addressing method is by GitSHA1. So a full .git-repo is needed.

binarin avatar Jan 17 '19 16:01 binarin

As I understand it, the bare git repo is only used because it is more compact than doing a repo checkout. However, there is no good way to get an up-to-date one within a nix sandbox. I had to revert 86f11b89 while working on updating nixpkgs-stackage. Getting the latest .zip is trivial (builtins.fetchurl), way faster (20s vs 1m20s for git clone) and way smaller (189MB vs 366MB). Zip allows random access for decompression, so should be fast to grab files out of.

yorickvP avatar Jan 17 '19 17:01 yorickvP

@yorickvP To make a latest .zip usable, you need to calculate GitSHA1 of every file inside of it and cache this info somewhere. It's doable, except for hackage revisions (just grep by x-revision) - there'll be only the latest revision available, without any way to fetch older ones. And that is what being solved by having a .git-folder.

Proper solution is to create some canonical representation of a .git repo which will be reproducible. Maybe that will require writing a custom git .pack file generator.

binarin avatar Jan 18 '19 11:01 binarin

Does stack even expose the used cabal file revision? The intractability of the problem does not seem worth any of the potential savings of using older cabal files sometimes, assuming cabal files are rarely updated and do not break anything.

yorickvP avatar Jan 18 '19 13:01 yorickvP

@yorickvP Yes, it's exposed - e.g. search for GitSHA1 in https://raw.githubusercontent.com/commercialhaskell/lts-haskell/master/lts-12.16.yaml

If non-breaking updates are OK, why you've enabled the sandboxing? =)

binarin avatar Jan 21 '19 09:01 binarin