Current nightly build of `shards` is broken
The shards binary in the current nightly build is broken.
$ wget https://output.circle-artifacts.com/output/job/249b749b-4d24-434a-8138-d0b3
530b7bf7/artifacts/0/dist_packages/crystal-1.14.0-dev-1-linux-x86_64.tar.gz
$ tar -xzf crystal-1.14.0-dev-1-linux-x86_64.tar.gz
$ crystal-1.14.0-dev-1/bin/shards --version
bash: crystal-1.14.0-dev-1/bin/shards: cannot execute: required file not found
$ ls -lh crystal-1.14.0-dev-1/bin/shards
-rwxr-xr-x 1 root root 3.3M Sep 3 00:12 crystal-1.14.0-dev-1/bin/shards
$ type crystal-1.14.0-dev-1/bin/shards
crystal-1.14.0-dev-1/bin/shards is crystal-1.14.0-dev-1/bin/shards
Not sure what's going on. Might just be a fluke and it'll be fixed in the next build.
I'm not aware we changed anything in the build process of shards.
Anyway, we appear to be missing a validation of the build product. A broken build should never be published.
In the current nightly build the shards executable does work again:
$ crystal-1.14.0-dev-1/bin/shards --version
Shards 0.18.0 [31b44d3] (2024-03-28
$ ls -lh crystal-1.14.0-dev-1/bin/shards
-rwxr-xr-x 1 root root 5.6M Sep 4 00:12 crystal-1.14.0-dev-1/bin/shards
So this appears to have been a random failure.
That still means we need validation of the build artifacts.
And today's nightly build is broken again. So apparently this wasn't a fluke.
$ wget https://output.circle-artifacts.com/output/job/fae3e672-872b-473b-a555-5234fe773654/artifacts/0/dist_packages/crystal-1.14.0-dev-1-linux-x86_64.tar.gz
$ tar -xzf crystal-1.14.0-dev-1-linux-x86_64.tar.gz
$ crystal-1.14.0-dev-1/bin/shards --version
zsh: no such file or directory: crystal-1.14.0-dev-1/bin/shards
$ ls -lh crystal-1.14.0-dev-1/bin/shards
-rwxr-xr-x 1 johannes johannes 3.3M Sep 5 02:13 crystal-1.14.0-dev-1/bin/shards
Digging a bit more into it, it seems the shards binary is actually a dynamically linked executable linking against musl libc. The "no such file or directory" error comes from the fact that the interpreter /lib/ld-musl-x86_64.so.1 is missing on a glibc system.
$ readelf -l crystal-1.14.0-dev-1/bin/shards
Elf file type is DYN (Position-Independent Executable file)
Entry point 0xa6d0
There are 12 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
0x00000000000002a0 0x00000000000002a0 R 0x8
INTERP 0x00000000000002e0 0x00000000000002e0 0x00000000000002e0
0x0000000000000019 0x0000000000000019 R 0x1
[Requesting program interpreter: /lib/ld-musl-x86_64.so.1]
The weird part about this is that we actually do have a check to ensure the shards binary is statically linked:
https://github.com/crystal-lang/distribution-scripts/blob/1b7fb7ff2a2a9d535ec95dd3aedbf8e1fc627212/linux/Dockerfile#L74
So I'm not sure how it could pass that test 😕 It fails when I test locally.
Today's build is fine again 🤷
And today's as well.
And this is happening again.
https://github.com/athena-framework/demo/actions/runs/11063118636/job/30738844538
I've encountered this in 2 of my projects recently:
- https://github.com/devnote-dev/cling/actions/runs/11753866125/job/32747205715#step:4:88
- https://github.com/devnote-dev/docr/actions/runs/11759771854/job/32759616408#step:3:90
This issue has been mentioned on Crystal Forum. There might be relevant details there:
https://forum.crystal-lang.org/t/post-mortem-issues-in-the-crystal-1-14-1-release-process/7610/1
@ggiraldez suggests the issue might be caused by make -C shards install rebuilding the executable. It's not clear why make would consider the build dependency out of date, though. A guess would be that it might be related to file timestamps in docker.
This would certainly explain the observations, and particularly the sporadic nature.
Unfortunately, the CI logs have the docker output truncated so I'm afraid we cannot retrace whether that happened on previous builds (https://github.com/crystal-lang/distribution-scripts/pull/346 is supposed to fix that). I haven't been able to reproduce and observe this locally yet.
Happened again: https://github.com/athena-framework/athena/actions/runs/12941655549/job/36098066968.
I think it's happening continuously at the moment, and the CI of install-crystal is monitoring the situation quite well, I'd say. https://github.com/crystal-lang/install-crystal/actions
Well any CI workflow that regularly runs with crystal latest documents the effect.
We'd know more if we updated distribution-scripts to show full build logs: https://github.com/crystal-lang/crystal/pull/15368
The logs from the latest nightly build confirm the suspicion: make install rebuilds shards, which then happens without the original configuration (e.g. static=1).
https://app.circleci.com/pipelines/github/crystal-lang/crystal/17043/workflows/7577b52c-6964-41f5-9d63-c46c042d4e00/jobs/88511?invite=true#step-102-137607_132
make --trace shows it thinks shard.yml is newer than shard.lock: update target 'shard.lock' due to: shard.yml
Okay so it seems to be the classic issue that timestamps might be slightly off after a git checkout. And there's an error in the Makefile: the shard.lock recipe doesn't touch the target if SHARDS=false and we're bootstrapping by downloading lib/molinillo with curl. Hence the original make build process doesn't mark shard.lock as fresh. This then triggers a rebuild on make install.
I am honestly surprised this issue has only started appearing quite recently.