zig
zig copied to clipboard
use an additional, project-local copy of dependency trees
Extracted from #14265.
Terminology clarification:
- A project is a directory of files, uniquely identified by their hash. Dependencies can export any number of artifacts and packages.
- A dependency is a directed edge between projects. A project may depend on any number of projects. A project may be a dependency of any number of projects.
- A package is a directory of files, along with a root source file that identifies the file referred to when the package is used with
@import. - An artifact is a static library, a dynamic library, an executable, or an object file.
Currently, zig puts all fetched dependencies in the global zig cache, like this:
$GLOBAL_ZIG_CACHE/p/$DEPENDENCY_HASH/*
Then, dependencies are used directly from this directory, and shared among all projects.
This proposal is for zig build to additionally copy each dependency from the global cache into a project-local directory, like this:
$PROJECT_ROOT/zig-deps/$DEPENDENCY_NAME/*
A transitive dependency would look like this:
$PROJECT_ROOT/zig-deps/$NAME1/zig-deps/$NAME2/*
This would be similar to the node_modules directory from npm.
Motivations:
- whether a build requires network access is independent from the state of the global cache system
- it would be possible to wipe the global cache without forcing projects to re-fetch their dependencies. Similarly adding GC or LRU to the global cache would not sometimes delete dependencies for a particular project.
- it would be possible to wipe a project's dependencies without wiping the global cache
- it is easier to find dependencies by name, and locally patch them to test changes
- temporary patches to dependencies would affect only one project; not the entire system globally
- it would become an option to commit the zig-deps directory into source control, or to distribute a tarball that includes the dependencies
- better compile errors when the lines point to dependencies; instead of getting a hash in the file name, you get the package name
Downsides:
- multiple copies of things on disk, wasting disk space
- somebody's going to suggest symlinking and all sorts of complicated stuff to go along with it
- an additional directory alongside zig-out and zig-cache:
zig-deps.
Open question: where to store the hash? It's nice to use the dependency name instead of the hash for the directory name, but it does leave the problem of how zig build should detect whether a dependency needs to be updated or not. It can always recompute hashes, but it should not be recomputing hashes on every zig build. Ideally, it would be only one open() call to open the directory of a dependency and find out whether the desired hash is present or not.
Is it correct that the idea is:
- to have global store/cache with flat dependencies under their hashes (with no transitive dependencies, except when they are committed into the VCS)
- in
zig buildyou need to build a tree of all top-level dependencies + all their transitive dependencies, if any is missed refetch them, then in local directory to recreate this tree using copy/symlinks, then you can start building
This proposal is for
zig buildto additionally copy each dependency from the global cache into a project-local directory, like this:$PROJECT_ROOT/zig-deps/$DEPENDENCY_NAME/*A transitive dependency would look like this:
$PROJECT_ROOT/zig-deps/$NAME1/zig-deps/$NAME2/*
this approach for local dependencies is problematic, particularly on windows and is very noticeable in the node ecosystem when using the default npm package manager as that folder layout is how it sets up node_modules. Windows has a super short MAX_PATH length of only 260 characters[0][1] and so if you create a folder/file with a path longer than that it becomes impossible to move, rename, delete, or otherwise operate on.
a flat $NAME-$HASH would likely be better to avoid that
a flat $NAME-$HASH would likely be better to avoid that
Yep, i agree on that. A flat namespace will support deduplication and make it more clear which packages are actual dependencies, also keeps you aware of how many deps you actually have.
Duplicate names can be resolved by appending the hash, even if this would make it a bit weird for the user to debug. Another option would be to make transitive duplicates named ${primary_dep}-${secondary_dep}, but only on conflict
an additional directory alongside zig-out and zig-cache: zig-deps.
Why not store the deps in the local zig cache? It is ephemeral anyway, and can be viewed as a cache (because they can be recomputed/re-downloaded).
multiple copies of things on disk, wasting disk space
This is not a problem on some modern filesystems (definitely on btrfs and xfs, there may be more) due to copy-on-write. As of some recent coreutils even cp does reflinking by default (equivalent to cp --reflink, but with graceful fallback if the FS does not support it).
Reflinking is not the default mode in zig (I have that in my backlog), but will become at some point.
Why not store the deps in the local zig cache? It is ephemeral anyway, and can be viewed as a cache (because they can be recomputed/re-downloaded).
because of this:
it would become an option to commit the zig-deps directory into source control, or to distribute a tarball that includes the dependencies
One option to reduce disk space without symlinking (which causes all sorts of other issues and largely removes the benefit of this proposal) is to use hard links.
This is only possible on some operating systems, and only if the project and global cache are on the same drive, but it would solve the disk space problem with basically no added complexity.
Npm with flat "tree" has a problem with importing transitive dependencies: project -> packageA -> packageB, with flat structure you're able to import packageB directly in project even if you haven't added projectB as dependency to project's build.zon.
I have a question what is my workflow will be if I need to patch one of transitive dependency? If local tree is created during zig build does it mean that I need to run zig build first with original dependency and only after that I will be able to modify this dependency and re-run zig build?
On idea could be that zig stores the compressed archives in the global cache, and only extracts them when installing to a local project. This would minimize disk space usage and helps ensure the original contents of the dependency remain intact (developer doesn't accidentally modify the contents of the files in the global cache). Also means there's no temptation or path to using "symlinks" to the global cache files.
On IRC I asked whether we should store decompressed archives in the global cache, so we can use more efficient means to decompress the file (I mentioned copy_file_range).
Turns out more efficient ways to decompress the file are not that more efficient. GNU tar wins (unsurprisingly, it is well optimized), followed by Andrew's stdlib implementation which uses pread/pwrite. sendfile and copy_file_range are a bit slower, definitely not worth the added complexity.
$ hyperfine --export-markdown table.md -r 5 -w 1 -p 'rm -fr ffmpeg' 'tar -xf ffmpeg.tar' 'tar -xf ffmpeg.tar.gz' './std-tar ffmpeg.tar' './std-tar ffmpeg.tar.gz' './maybe-faster sendfile ffmpeg.tar' './maybe-faster copy_file_range ffmpeg.tar'
Benchmark 1: tar -xf ffmpeg.tar
Time (mean ± σ): 840.9 ms ± 34.8 ms [User: 14.4 ms, System: 813.4 ms]
Range (min … max): 800.5 ms … 890.5 ms 5 runs
Benchmark 2: tar -xf ffmpeg.tar.gz
Time (mean ± σ): 900.8 ms ± 17.9 ms [User: 294.7 ms, System: 858.3 ms]
Range (min … max): 882.3 ms … 921.4 ms 5 runs
Benchmark 3: ./std-tar ffmpeg.tar
Time (mean ± σ): 863.5 ms ± 19.1 ms [User: 5.7 ms, System: 844.4 ms]
Range (min … max): 838.1 ms … 886.8 ms 5 runs
Benchmark 4: ./std-tar ffmpeg.tar.gz
Time (mean ± σ): 1.391 s ± 0.122 s [User: 0.404 s, System: 0.974 s]
Range (min … max): 1.179 s … 1.486 s 5 runs
Benchmark 5: ./maybe-faster sendfile ffmpeg.tar
Time (mean ± σ): 1.089 s ± 0.049 s [User: 0.005 s, System: 1.072 s]
Range (min … max): 1.048 s … 1.143 s 5 runs
Benchmark 6: ./maybe-faster copy_file_range ffmpeg.tar
Time (mean ± σ): 1.162 s ± 0.044 s [User: 0.006 s, System: 1.092 s]
Range (min … max): 1.121 s … 1.235 s 5 runs
Summary
'tar -xf ffmpeg.tar' ran
1.03 ± 0.05 times faster than './std-tar ffmpeg.tar'
1.07 ± 0.05 times faster than 'tar -xf ffmpeg.tar.gz'
1.29 ± 0.08 times faster than './maybe-faster sendfile ffmpeg.tar'
1.38 ± 0.08 times faster than './maybe-faster copy_file_range ffmpeg.tar'
1.65 ± 0.16 times faster than './std-tar ffmpeg.tar.gz'
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
tar -xf ffmpeg.tar |
840.9 ± 34.8 | 800.5 | 890.5 | 1.00 |
tar -xf ffmpeg.tar.gz |
900.8 ± 17.9 | 882.3 | 921.4 | 1.07 ± 0.05 |
./std-tar ffmpeg.tar |
863.5 ± 19.1 | 838.1 | 886.8 | 1.03 ± 0.05 |
./std-tar ffmpeg.tar.gz |
1390.6 ± 122.5 | 1178.6 | 1486.2 | 1.65 ± 0.16 |
./maybe-faster sendfile ffmpeg.tar |
1088.6 ± 48.7 | 1047.6 | 1142.7 | 1.29 ± 0.08 |
./maybe-faster copy_file_range ffmpeg.tar |
1161.7 ± 43.8 | 1121.2 | 1235.0 | 1.38 ± 0.08 |
Files: std-tar.txt maybe-faster.txt
Built with:
for f in maybe-faster.zig std-tar.zig; do zig build-exe -lc -OReleaseFast $f; done

😂😂😂
zig fetch --global-cache-dir vendor --save https://github.com/andrewrk/mime/archive/refs/tags/1.0.0.tar.gz
I recently started using zig and investigating how to use vendor to store all dependencies like composer or cargo . I not sure if this is the right way to use it.
zig fetch --global-cache-dir vendor xxx.tag.gz
zig build --global-cache-dir vendor
🤔🤔🤔 This does not built using vendor.
The zig-cache directory is intended to be excluded from source control.
I've created a PR for this #20150. Although that the moment the package hash is currently just stored "as-is" in zig-deps. I think this has some limitations, and having thought about it, I'm thinking of implementing something like:
- If the package has a manifest, use the manifest name it in
zig-deps:- If there is already a package with that name in
zig-deps(but a different version) - each folder (including the original conflicting one) is renamed to<manifest name>-<@version>, and if there are still hash differences, then it should be<manifest-name>-<@version>-<hash>.
- If there is already a package with that name in
- If the package doesn't have a manifest, then we use the name provided in the
build.zig.zonfor those referenced directly by the root project. Otherwise we just use the hash for transitive dependencies as there is no unique logical name that can be used.
Potentially using a lock file or some-such to keep track of things and having the ability to force hashes to be re-checked based on the folder contents of the packages.
I'm sure there's a fair few things I have failed to consider... but hopefully by addressing feedback on this I can hopefully construct a mergeable PR that solves all the problems that need to be solved.
I don't know if this is overcomplicating things relative to:
This proposal is for zig build to additionally copy each dependency from the global cache into a project-local directory, like this:
$PROJECT_ROOT/zig-deps/$DEPENDENCY_NAME/*A transitive dependency would look like this:
$PROJECT_ROOT/zig-deps/$NAME1/zig-deps/$NAME2/*This would be similar to the node_modules directory from npm.
But I do think the concern of how deep those paths could get on Windows to be legitimate. And it would be nice to avoid copying packages locally within a repository at a minimum. (although my stealth hot-take is that transitive dependencies are way more hassle then they are worth, and libraries should be really, really judicial about using them at all in the first place).
Hi, I'm here with real-word use-case scenario. Recently attempted to package zls for sisyphus. The thing is that sisyphus requires so that source tarball/git repo builds without internet connection. It would be nice to have something like cargo vendor or go mod vendor to add/update all dependencies at once and then proceed with offline build.
zig build --fetch will already do that, and is orthogonal to this feature request
edit: the issue was also that they were doing a http request in their configure phase
zig build --fetchwill already do that, and is orthogonal to this feature request
zig fetch (or zig build --fetch) uses zig-cache directory. As Andrew said it is not intended to be commited to the source tree. zig-deps folder with all dependencies names would be nice to have.
Though I admit it seems it is possible to build zig project in offline with --global-cache-dir, fetch and --system. I will open issue in zls regarding offline build.
After tinkering for a while found out that there is a flag -Dversion_data_path which can be set to local langref.html.in. Thus zls can be built offline! Some reasonable names instead of hashes in the project tree would be nice though :)
Closing in favor of #20180. I think this use case is solved with a combination of that, plus some follow-up tooling, plus the --system flag that is already implemented.
Not only am I reopening this, I'm accepting it. Over time I've decided that the following scenario will be better:
- Global cache stores recompressed, immutable packages
- Local cache stores uncompressed project-local dependency tree.
After a break I've started to look into some zig stuff again (the issue has been both been closed and opened in that time 😄) - is it worth me trying to ressurect my (very, very old) pull request #20150, or is someone else planning to tackle this?
I don't think this is a particularly contributor-friendly issue.
Are we concerned about the tree structure causing issues with filesystem path limits? Sometimes paths longer than 260 characters cause issues on windows for example. The alternative solution that comes to mind is to put all the deps in one flat directory so long as we can solve the issue of name conflicts.
$PROJECT_ROOT/zig-deps/foo/*
$PROJECT_ROOT/zig-deps/bar/*
$PROJECT_ROOT/zig-deps/bar2/* ?
This also means if there are two dependencies that are exactly the same, they can share the same directory.
P.S. maybe if there's a conflict we start adding context to the name in the form of a dependency name path?
Let's say the dependency tree looks like this:
- mycoolproject
- foo
- libA
- foo
- libB
- foo
-libC
- libA
- foo
$PROJECT_ROOT/zig-deps/mycoolproject.foo
$PROJECT_ROOT/zig-deps/libA.foo
$PROJECT_ROOT/zig-deps/libB.foo
$PROJECT_ROOT/zig-deps/libC.libA.foo
I'm not suggesting a tree structure. For starters I expect this to have the same structure as master branch global cache.
Out of curiosity, as I'm out of the loop on the structure of the master branch global cache:
For starters I expect this to have the same structure as master branch global cache.
Will this still be amenable to setting up hermetic projects (i.e. will it be suitable for us to check local dependency copies in to our source control systems as-is)? Is this still a goal as per the issue description?
it would become an option to commit the zig-deps directory into source control, or to distribute a tarball that includes the dependencies.
(Also thank you for responding so promptly before, will see if I can find smaller issues to tackle & contribute to, in order to get back into things.)