zig icon indicating copy to clipboard operation
zig copied to clipboard

[std.http] zig build fails to download large archives from github

Open LordMZTE opened this issue 2 years ago • 1 comments

Zig Version

0.11.0-dev.2648+3cd19dd89

Steps to Reproduce and Observed Behavior

Note: This is almost certainly a bug in std.http rather than the build system.

  1. Create a minimal build.zig:
const std = @import("std");

pub fn build(b: *std.Build) void {
    _ = b;
}
  1. Create a build.zig.zon with a link to reproduce the bug:
.{
    .name = "test",
    .version = "0.0.0",

    .dependencies = .{
        .test_dep = .{
            // The code of this repo is likely not important, it's just a link that allows to reproduce this.
            .url = "https://github.com/lordmzte/dawn/archive/9e2a193db129ebaf42eccfb9d6bb40055645c5d9.tar.gz",
        },
    },
}
  1. zig build

  2. The build will fail with one of two errors, seemingly at random:

  • error: TarComponentsOutsideStrippedPrefix
  • error: BadReaderState

Note that the tar archive the link points to is very large (>90MB). I suspect that this causes the GitHub server to use chunked encoding, which may cause either invalid data or a crash in the HTTP implementation in the two possible cases.

This is the stack trace produced by a debug build of Zig in the former case:

error: TarComponentsOutsideStrippedPrefix
/home/lordmzte/dev/zig/lib/std/tar.zig:181:13: 0x6000f17 in stripComponents (zig)
            return error.TarComponentsOutsideStrippedPrefix;
            ^
/home/lordmzte/dev/zig/lib/std/tar.zig:130:35: 0x6001c2c in pipeToFileSystem__anon_49378 (zig)
                const file_name = try stripComponents(unstripped_file_name, options.strip_components);
                                  ^
/home/lordmzte/dev/zig/src/Package.zig:561:5: 0x6002a57 in unpackTarball__anon_49145 (zig)
    try std.tar.pipeToFileSystem(out_dir, decompress.reader(), .{
    ^
/home/lordmzte/dev/zig/src/Package.zig:490:13: 0x60139ea in fetchAndUnpack (zig)
            try unpackTarball(gpa, &req, tmp_directory.handle, std.compress.gzip);
            ^
/home/lordmzte/dev/zig/src/Package.zig:282:25: 0x6018e6b in fetchAndAddDependencies (zig)
        const sub_pkg = try fetchAndUnpack(
                        ^
/home/lordmzte/dev/zig/src/main.zig:4402:13: 0x5e7a46a in cmdBuild (zig)
            try fetch_result;
            ^
/home/lordmzte/dev/zig/src/main.zig:298:9: 0x5e46dfb in mainArgs (zig)
        return cmdBuild(gpa, arena, cmd_args);
        ^
/home/lordmzte/dev/zig/src/main.zig:211:5: 0x5e458eb in main (zig)
    return mainArgs(gpa, arena, args);
    ^

Expected Behavior

The dependency is downloaded.

LordMZTE avatar Apr 17 '23 18:04 LordMZTE

I don't have a minimally failing example, but I think I'm seeing the same thing directly in the HTTP client if I get a response that is both chunked and gzip'd. Posting just in case this is helpful.

Snipped loop, reading an HTTP response that has already been wait'd,

            var out = ArrayList(u8).init(allocator);
            var total: usize = 0;
            var buf: [4096]u8 = undefined;
            var w = out.writer();
            while (true) {
                const n = try req.readAll(&buf);
                total += n;
                if (n == 0) break;
                try w.writeAll(buf[0..n]);
            }

Ends with a GZIP error when it tries to read one more checksum byte after already reaching the end of its input stream.

C:\Users\david\scoop\apps\zig-dev\current\lib\std\io\reader.zig:56:37: 0x7ff69b7c954b in readNoEof (load.exe.obj)
            if (amt_read < buf.len) return error.EndOfStream;
                                    ^
C:\Users\david\scoop\apps\zig-dev\current\lib\std\io\reader.zig:256:13: 0x7ff69b7d21dd in readBytesNoEof__anon_10654 (load.exe.obj)
            try self.readNoEof(&bytes);
            ^
C:\Users\david\scoop\apps\zig-dev\current\lib\std\io\reader.zig:294:27: 0x7ff69b7d4201 in readIntLittle__anon_10657 (load.exe.obj)
            const bytes = try self.readBytesNoEof((@typeInfo(T).Int.bits + 7) / 8);
                          ^
C:\Users\david\scoop\apps\zig-dev\current\lib\std\compress\gzip.zig:136:26: 0x7ff69b7807fa in read (load.exe.obj)
            const hash = try self.in_reader.readIntLittle(u32);
                         ^
C:\Users\david\scoop\apps\zig-dev\current\lib\std\http\Client.zig:756:54: 0x7ff69b767cdd in read (load.exe.obj)
            .gzip => |*gzip| gzip.read(buffer) catch return error.DecompressionFailure,
                                                     ^
C:\Users\david\scoop\apps\zig-dev\current\lib\std\http\Client.zig:787:25: 0x7ff69b767ad2 in readAll (load.exe.obj)
            const amt = try read(req, buffer[index..]);

C:\Users\david\src\damp\zig\s3.zig:283:27: 0x7ff69b7689e8 in call (load.exe.obj)
                const n = try req.readAll(&buf);
                
[snip]

The actual response size is only 373 bytes, so the loop should only execute once in normal execution.

dadrian avatar Jun 11 '23 05:06 dadrian

I don't have a minimally failing example

I do, and it is Zig itself :)

.{
    .name = "foo",
    .version = "0.0.0",
    .dependencies = .{
        .zig = .{
            .url = "https://github.com/ziglang/zig/archive/bf827d0b555df47ad2a2ea2062e2c855255c74d1.tar.gz",
        },
    },
}

luizberti avatar Jul 17 '23 18:07 luizberti

Noting that this is blocking Ghostty from adopting the package manager. The following projects are all too build to big:

  • libxml2
  • harfbuzz
  • Mach's xcode-frameworks

mitchellh avatar Aug 04 '23 05:08 mitchellh

This issue should be renamed, the std.http bug has been fixed for a while now. It's now a std.tar issue.

truemedian avatar Aug 23 '23 21:08 truemedian

The reproduction tar (link provided above) contains PAX attributes. These attributes store file paths that don't fit in the header, specifically:

zig-bf827d0b555df47ad2a2ea2062e2c855255c74d1/test/cases/compile_errors/assigning_to_struct_or_union_fields_that_are_not_optionals_with_a_function_that_returns_an_optional.zig
zig-bf827d0b555df47ad2a2ea2062e2c855255c74d1/test/cases/compile_errors/regression_test_2980_base_type_u32_is_not_type_checked_properly_when_assigning_a_value_within_a_struct.zig

#16990 adds support for this.

189900 avatar Aug 27 '23 17:08 189900

What was the issue / fix / PR for the HTTP part of the issue?

dadrian avatar Aug 30 '23 20:08 dadrian

If I remember correctly #15445 fixed the problem encountered here, the redirect was causing the issue.

truemedian avatar Aug 30 '23 20:08 truemedian

If I remember correctly #15445 fixed the problem encountered here, the redirect was causing the issue.

@truemedian Do you remember/know how #15376 and various other issues only mention EndOfStream being encountered for the HTTP issue, while this issue's OP mentions TarComponentsOutsideStrippedPrefix and BadReaderState?

Is it possible that this issue was always about the PAX attributes, and the HTTP issue was merely a red herring?

winterqt avatar Sep 11 '23 03:09 winterqt

The HTTP issue was certainly happening for everyone who tried to make a request to github.com/*/archive, because they all redirect to the github CDN. But that issue has been fixed for the past 5 months.

truemedian avatar Sep 11 '23 14:09 truemedian

Right, but the redirect issue was fixed after this issue was filed.

The other issue about the redirect bug mentions EndOfStream, not any of the two errors that this one points to (TarComponentsOutsideStrippedPrefix and BadReaderState).

This makes me wonder if the HTTP bug was really related here (even though it should be because it uses the same GitHub archive link in the build.zig.zon).

Guess I could test it by reverting the PAX attribute impl and seeing what gets thrown.

winterqt avatar Sep 11 '23 16:09 winterqt

zig-linux-x86_64-0.12.0-dev.389+61b70778b not resolved =/

Can I specify local folder path instead of using .dependencies .url ? I have very slow internet, possibly slowest you ever can imagine, and I was downloaded git repo, put it into tar.gz folder, opened IntelliJ server to host the archive and I am getting subj. And this error (TarComponentsOutsideStrippedPrefix) does not cares if the archive is small (under 1M) when I cut off large unrelated folders.

The repo I am talking is capy, UI for Zig. I was hoping that I can learn Zig while doing UI while I have worst ever days with slowest possibly internet, but...

ivanstepanovftw avatar Sep 17 '23 02:09 ivanstepanovftw