zig icon indicating copy to clipboard operation
zig copied to clipboard

Support file:/// URIs and relative paths in package manager

Open AdamGoertz opened this issue 2 years ago • 20 comments

Closes #14339. I'm definitely looking for some feedback on this. The core functionality is there, but there are some design choices around caching of local files that I'd like to get other peoples' thoughts on.

This PR adds support in the package manager for fetching dependencies from both absolute and relative paths. For example, a project with the following dependency structure successfully compiles:

base/
├── build.zig
├── build.zig.zon
├── src
│   └── . . .
dep/
├── build.zig
├── build.zig.zon
├── dep2
│   ├── build.zig
│   └── src
│       └── . . .
└── src
    └── . . .

base/build.zig.zon:

.{
    .name = "base",
    .version = "0.0.0",
    .dependencies = .{
        .dep = .{
            .url = "file:///home/adam/dep/",
            .hash = "1220ebfc0074d94ecdab83cef08dae264eeb0c8ae6ee9734fd9bcd3fe83a6236e8a1",
        },
    },
}

dep/build.zig.zon:

.{
    .name = "dep",
    .version = "0.0.0",
    .dependencies = .{
        .dep2 = .{
            .url = "./dep2",
            .hash = "1220abdec7efb3042b27db55066eae97088b527be45ab7dda0fe706d6a6f64d3cf60",
        }
    }
}

File dependencies can be provided as either file:/// URIs or as relative paths (which must be denoted by a leading . or .. to distinguish them from URIs. These paths can refer to either a directory containing a build.zig or to a file that the package manager is capable on unpacking, such as a .tar.gz.

In the case of a file that must be unpacked, such as a .tar, it is unpacked into the global cache in the same way as if the file was fetched from the internet. However, if the provided .url field points to a directory, no caching is performed and the build system instead references the files using the provided paths. Relative paths are resolved relative to the build.zig of the package that declared the dependency.

There are several reasons I have so far chosen not to cache directories. First, because doing so makes resolving relative paths for directories that are nested multiple layers down the dependency tree significantly more complicated. Second, to support monorepo style projects where working on multiple packages simultaneously is a common use case. If these dependencies were cached, then the cache would have to be cleared in between each build in order to make changes to dependent packages visible to the compiler.

The downside of not caching directories that already exist on the filesystem is that, in order to guarantee the provided hash (in build.zig.zon is kept in sync, dependencies must be hashed on each build. This introduces some performance penalty for using these types of dependencies. Additionally, the hash in build.zig.zon must be updated every time a dependency is edited (which is exactly the intended behavior, but is a hassle when developing multiple packages simultaneously).

Let me know if you have any feedback or suggestions.

AdamGoertz avatar Feb 09 '23 02:02 AdamGoertz

Apologies, I'm not that familiar with Zig(so take the following question with that caveat).

Is this prone to a path traversal attack? If I clone a git repo that has a build.zig.zon with .url = "file:///etc/passwd" and then I run zig build, will it actually read that file?

komuw avatar Feb 15 '23 13:02 komuw

relative paths are not URIs, i believe it is inappropriate to add this special casing to std.Uri

praschke avatar Feb 15 '23 15:02 praschke

This type of functionality is useful for projects like https://github.com/SerenityOS/serenity.

In that single project tree are ~60+ libraries and ~80+ applications all with dependencies on each other i.e. appA -> libA -> libB.

Needing hashes, references to archive files or needing absolute paths (how can that work on my machine and yours?) flies in the face of developing multiple things in a single source tree.

leecannon avatar Feb 15 '23 15:02 leecannon

@leecannon i am not saying relative paths should not be supported, i am saying that std.Uri is not the place to implement that support.

additionally, this PR does nothing about the onerous hash requirement.

praschke avatar Feb 15 '23 15:02 praschke

I agree that parsing relative paths inside std.Uri is a bit weird. It does make the subsequent logic a bit easier, since we don't need to handle a special case for relative paths, since absolute paths are already a valid URI scheme.

Regarding the hash requirement, my opinion is that it is important to ensure that the hashes referenced in build.zig.zon stay in-sync with code changes. This problem might be best solved by a build flag, something like -Dauto-update-hashes that automatically edits the hash entries in build.zig.zon with the new hash values for any changed packages.

AdamGoertz avatar Feb 16 '23 02:02 AdamGoertz

Paths are now a possible field for dependencies inside build.zig.zon. So the example above would now look like this:

.{
    .name = "base",
    .version = "0.0.0",
    .dependencies = .{
        .dep = .{
            .path = "/home/adam/dep/",
            // alternatively, .url = "file:///home/adam/dep/",
            .hash = "1220ebfc0074d94ecdab83cef08dae264eeb0c8ae6ee9734fd9bcd3fe83a6236e8a1",
        },
    },
}
.{
    .name = "dep",
    .version = "0.0.0",
    .dependencies = .{
        .dep2 = .{
            .path = "dep2",
            .hash = "1220abdec7efb3042b27db55066eae97088b527be45ab7dda0fe706d6a6f64d3cf60",
        }
    }
}

This removes the special casing inside std.Uri and has the added benefit of allowing relative paths without a leading ..

AdamGoertz avatar Feb 17 '23 03:02 AdamGoertz

Apologies, I'm not that familiar with Zig(so take the following question with that caveat).

Is this prone to a path traversal attack? If I clone a git repo that has a build.zig.zon with .url = "file:///etc/passwd" and then I run zig build, will it actually read that file?

This depends on the permissions the zig compiler is running with. For example, if I run zig build as a normal, non-root, user, attempting to access files like /etc/passwd or /etc/shadow fails with error.AccessDenied. If for some reason you were running a build as root, then it may be able to open and read them. Either way, currently the build system doesn't do anything with open files other than compute their hash. It's possible there's a way of extracting some useful information or primitive from that, but I'm not knowledgeable enough about cybersecurity to say. What it does not do is copy the listed files anywhere more easily accessible, such as the global cache.

AdamGoertz avatar Feb 18 '23 03:02 AdamGoertz

Yeah, you can right now try to access system files like /etc/passwd from a build.zig, and do much worse than whatever having them in a .zon will do. Hence, you shouldn't be running builds as root.

TUSF avatar Feb 18 '23 03:02 TUSF

this discussion should take place on #14339.

  • running zig as root is obviously an issue on least privilege grounds, but there are plenty of user-readable files with user secrets, several at well-known paths modulo your username
  • dependencies that are files/directories/archives without any zig or zon in them should be made available to build.zig in a manipulable way, or the package manager isn't actually a replacement for git submodules, see #14488
  • build.zig currently can do these things without build.zig.zon, but absolute paths in build.zig.zon might become a way to escape sandboxing if it is implemented, see #14286

praschke avatar Feb 18 '23 09:02 praschke

@andrewrk and @Vexu Would you mind give this a review?

I think local path support is very important for users to try package manager.

jiacai2050 avatar Mar 03 '23 13:03 jiacai2050

I will have a look soon.

andrewrk avatar Mar 03 '23 18:03 andrewrk

I think local path support is very important for users to try package manager.

If there are issues with recursion and directory trees, it would be good to at least allow file:/// access to archive files (.tar.gz, .tgz, .bgz, etc.) which shouldn't allow people to escape sandboxing.

buzmeg avatar Mar 07 '23 02:03 buzmeg

I will have a look soon.

Can we get this pulled into the main branch? Given all the changes in the build system, it really needs to get pulled in so people can start trying it out.

buzmeg avatar Mar 25 '23 23:03 buzmeg

Sorry, not until I look at it. The timing is a bit unfortunate since I was focusing on the build-parallel branch and now I am having a much needed vacation. Please be patient.

andrewrk avatar Mar 26 '23 01:03 andrewrk

Take your time. I don’t want this merged until it’s been reviewed either. We appreciate all the work you’ve been doing, and you deserve a break.

AdamGoertz avatar Mar 26 '23 01:03 AdamGoertz

Sorry, not until I look at it. The timing is a bit unfortunate since I was focusing on the build-parallel branch and now I am having a much needed vacation. Please be patient.

Understandable.

I tried to pull the fork to try it out, but I clearly just don't understand the build.zig.zon stuff clearly enough to provide credible feedback. :(

buzmeg avatar Mar 26 '23 02:03 buzmeg

As a solution to the last bullet point in the OP, does it make sense to have a switch to ignore the hash check when referencing a directory (either absolute or relative) when building? I can see the hash check being important for people that vendor their dependencies, but for someone working in a monorepo style (as noted) it can become a hassle quickly. I can't think of a reason to have the same switch for tarball/remote packages, but I digress.

EDIT: just realized I regurgitated https://github.com/ziglang/zig/issues/14339#issuecomment-1474518628

haze avatar May 22 '23 21:05 haze

At this point, we need to get something into people's hands so everybody can start iterating on it. It's not going to be perfectly correct, that's fine. But this is currently holding up the possibility of exploring the space as well as people sending in patches to make things work better as well as people writing tutorials about it.

buzmeg avatar May 22 '23 21:05 buzmeg

Any plans to merge this pull request into master? I've tried to compile Zig with this patch to no avail and quite some trouble as well - it would be nice to have this be merged into master and for an official Zig build to be provided that supports this very-well needed feature.

Thanks in advance (the conflicts should also be resolved, too).

HTGAzureX1212 avatar Jun 16 '23 10:06 HTGAzureX1212

Any plans to merge this pull request into master? I've tried to compile Zig with this patch to no avail and quite some trouble as well - it would be nice to have this be merged into master and for an official Zig build to be provided that supports this very-well needed feature.

Thanks in advance (the conflicts should also be resolved, too).

I should have a chance to resolve the conflicts this coming week (currently traveling). It still needs review before it can be merged, hopefully now that intern pool is landed and SYCL is over, the core team will have some more time.

AdamGoertz avatar Jun 16 '23 14:06 AdamGoertz

@AdamGoertz how;'s everything coming along?

HTGAzureX1212 avatar Jun 29 '23 07:06 HTGAzureX1212

The CI seemingly is failing on Windows regarding the usage of allocator.dupeZ

HTGAzureX1212 avatar Jul 09 '23 08:07 HTGAzureX1212

The Windows release workflow failed with a weird AccessDenied error. Perhaps rerun that workflow?

HTGAzureX1212 avatar Jul 10 '23 23:07 HTGAzureX1212

Did this get integrated? It really needs to be in 0.11 so people can start banging on it ...

buzmeg avatar Jul 19 '23 19:07 buzmeg

No, this PR has not been merged yet. I'm currently running off a local build with this patch.

HTGAzureX1212 avatar Jul 20 '23 13:07 HTGAzureX1212

cc @andrewrk (I apologize for the ping) but could this PR be reviewed and merged? This has taken a while and it is quite a hassle to compile zig from source (took me several days to fix everything to work correctly) so having an official master build from the website would be preferred.

HTGAzureX1212 avatar Jul 23 '23 09:07 HTGAzureX1212

OK, so the big question here:

Assuming the use case this solves is to vendor a dependency and then use it, why would you use this (file:///) when you could instead use addAnonymousDependency?

Example from the zig test suite:

https://github.com/ziglang/zig/blob/2826f78a61c014b7cfb9e7ce1a7efce31b24d0c9/test/tests.zig#L621

You would put something like this in your build.zig script:

const dep = b.anonymousDependency("vendor/build.zig", @import("vendor/build.zig"), .{});

This is equivalent to the dependency function, except it's used when you have the dependency source code in-tree rather than fetched externally via a build.zig.zon file.

Let's sort out the use cases and make sure we have a suggested best practice for each one. I think I have outlined the "monorepo" use case above. What else?

andrewrk avatar Jul 29 '23 03:07 andrewrk

More of a capability than a use case, but one advantage of the file:/// scheme over an anonymous dependency is that it doesn't have to be a source tree, it can also be an archive (e.g. tar.gz or any other file format that we support).

AdamGoertz avatar Jul 29 '23 03:07 AdamGoertz

Let's sort out the use cases and make sure we have a suggested best practice for each one. I think I have outlined the "monorepo" use case above. What else?

addAnonymousDependency is relative to build.zig only, right? The use-cases I expect .url = file://home/deps or .path = "/home/foo/bar" are mostly out-of-tree references.

What about when a tree of files are controlled by a different source control system? For example, you can't really map or hash gigantic game assets into everybody's local source tree. Those are normally mounted on a consistent directory across systems by some network file system so that you can see the whole thing but only copy/reference the specific files you use. Or the upstream dependency is a Mercurial repository so can't be mapped as a git submodule?

What about if things are controlled by an external entity like NixOS or Fedora Silverblue? Nothing in NixOS is ever going to accept a build.zig or build.zig.zon and to replicate the Nix build chain you would have to pull in gigabytes and gigabytes of dependencies just like Nix does. I'm pretty sure that files there are consistent across machines and builds: For example: file://nix/store/zpgj9dsn75grr1lqybw2si3bbzlvcd17-libcap-ng-0.8.3/ However, I'm simply a tyro on NixOS or Fedora Silverblue. I suspect mitchellh would be a better source on that kind of stuff.

For my personal use case, it was particularly irritating that I could specify a repository on Github which would then go download the githubarchive.tgz as a URL but when I specified "file://home/me/localgitarchive.tgz" the operation would fail.

buzmeg avatar Jul 29 '23 07:07 buzmeg

For the monorepo use-case specifically, one scenario that isn't handled well by anonymousDependency is sibling packages: if I have a monorepo with two package directories core and extra, and I need extra to depend on core, I can't do @import("../core/build.zig") from within extra/build.zig because imports outside the package root are not allowed.

One workaround for this today is to create a symlink to core somewhere under extra (I've done this in one of my projects for this reason), but that doesn't seem optimal. With this change, I could just add a dependency using the path ../core, which is a lot cleaner, in my opinion.

ianprime0509 avatar Jul 29 '23 14:07 ianprime0509