pants icon indicating copy to clipboard operation
pants copied to clipboard

Error downloading dockerfile artifact

Open StealthBadger747 opened this issue 1 year ago • 6 comments

Describe the bug

This only started happening very recently, as far as I know not much has changed in our build environment and the build process has stayed pretty much the same. But basically it seems like it is unable to download the dockerfile dependency for some reason. I have ssh'd into the build runner and have run curl on the url and that works so it doesn't look like a connectivity issue and this has persisted over many runs over the past two days with a 100% failure rate.

I have found a workaround in enabling the experimental rust_parser which builds the dockerfile properly.

Pants version

Which version of Pants are you using?

I have tried 2.21.0 / 2.22.1 and 2.23.0

The logs are the same for all versions.

OS Are you encountering the bug on MacOS, Linux, or both?

Only on linux in a self hosted GHA runner (runs-on)

runner@hostname:~/_work/healthleap/healthleap$ uname -a
Linux hostname 6.8.0-1009-aws #9-Ubuntu SMP Fri May 17 14:39:23 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
runner@hostname:~/_work/healthleap/healthleap$ cat /etc/os-release
PRETTY_NAME="Ubuntu 24.04 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04 LTS (Noble Numbat)"
VERSION_CODENAME=noble
....

Additional info Add any other information about the problem here, such as attachments or links to gists, if relevant.

runner@hostname:~/_work/healthleap/healthleap$ pants --no-remote-cache-read --no-remote-cache-write --no-local-cache package --docker-build-verbose packages/hl-api/hl_api:hl-api
21:37:17.62 [INFO] Initializing scheduler...
21:37:17.67 [INFO] Initializing Nailgun pool for 16 processes...
21:37:19.46 [INFO] Scheduler initialized.
21:37:19.66 [WARN] Unmatched globs from packages/hl-hl7-consumer/hl_hl7_consumer_tests:hl_hl7_consumer_tests's `sources` field: ["packages/hl-hl7-consumer/hl_hl7_consumer_tests/*.py", "packages/hl-hl7-consumer/hl_hl7_consumer_tests/*.pyi"], excludes: ["packages/hl-hl7-consumer/hl_hl7_consumer_tests/*_test.py", "packages/hl-hl7-consumer/hl_hl7_consumer_tests/*_test.pyi", "packages/hl-hl7-consumer/hl_hl7_consumer_tests/conftest.py", "packages/hl-hl7-consumer/hl_hl7_consumer_tests/test_*.py", "packages/hl-hl7-consumer/hl_hl7_consumer_tests/test_*.pyi", "packages/hl-hl7-consumer/hl_hl7_consumer_tests/tests.py", "packages/hl-hl7-consumer/hl_hl7_consumer_tests/tests.pyi"]

Do the file(s) exist? If so, check if the file(s) are in your `.gitignore` or the global `pants_ignore` option, which may result in Pants not being able to see the file(s) even though they exist on disk. Refer to https://www.pantsbuild.org/troubleshooting#pants-cannot-find-a-file-in-your-project.
21:37:19.67 [WARN] Unmatched globs from packages/hl-core/hl_core_tests/hl7:test_utils's `sources` field: ["packages/hl-core/hl_core_tests/hl7/*_test.pyi", "packages/hl-core/hl_core_tests/hl7/conftest.py", "packages/hl-core/hl_core_tests/hl7/test_*.pyi", "packages/hl-core/hl_core_tests/hl7/tests.pyi"]

Do the file(s) exist? If so, check if the file(s) are in your `.gitignore` or the global `pants_ignore` option, which may result in Pants not being able to see the file(s) even though they exist on disk. Refer to https://www.pantsbuild.org/troubleshooting#pants-cannot-find-a-file-in-your-project.
21:37:20.72 [INFO] Starting: Building dockerfile_parser.pex from resource://pants.backend.docker.subsystems/dockerfile.lock
21:37:21.81 [INFO] Completed: Building dockerfile_parser.pex from resource://pants.backend.docker.subsystems/dockerfile.lock
21:37:21.81 [ERROR] 1 Exception encountered:

Engine traceback:
  in `package` goal

ProcessExecutionFailure: Process 'Building dockerfile_parser.pex from resource://pants.backend.docker.subsystems/dockerfile.lock' failed with exit code 1.
stdout:

stderr:
There was 1 error downloading required artifacts:
1. dockerfile 3.2 from https://files.pythonhosted.org/packages/0e/de/00149a416148c609c71c8a94e5e4df14a9f62bf2fa41aeda021b76388623/dockerfile-3.2.0-cp36-abi3-manylinux_2_5_x86_64.manylinux1_x86_64.whl
    pip: Executing /home/runner/.cache/pants/named_caches/pex_root/venvs/34c8697f20cbc61a130d2863b492cdb77bb19979/d56f87eee1e7cb14eca0b0968944a6f58d9e642e/bin/python -sE /home/runner/.cache/pants/named_caches/pex_root/venvs/34c8697f20cbc61a130d2863b492cdb77bb19979/d56f87eee1e7cb14eca0b0968944a6f58d9e642e/pex --disable-pip-version-check --no-python-version-warning --exists-action a --no-input --isolated -q --cache-dir /home/runner/.cache/pants/named_caches/pex_root/pip/24.0/pip_cache --log /tmp/pants-sandbox-zfg86T/.tmp/pex-pip-log.podu2hn0/pip.log download --dest /home/runner/.cache/pants/named_caches/pex_root/downloads/e6bd64408386b7ba2259d85820e0fe90de1b6b8269f560f18aba100c6aa40b7d.lck.work --no-deps https://files.pythonhosted.org/packages/0e/de/00149a416148c609c71c8a94e5e4df14a9f62bf2fa41aeda021b76388623/dockerfile-3.2.0-cp36-abi3-manylinux_2_5_x86_64.manylinux1_x86_64.whl --index-url https://pypi.org/simple/ --retries 5 --timeout 15 failed with -11

Use `--keep-sandboxes=on_failure` to preserve the process chroot for inspection.

StealthBadger747 avatar Nov 27 '24 21:11 StealthBadger747

Your very last line tells a whole story: ... failed with -11. That means Python in that venv segfaulted. You should be able to self serve from there; i.e. investigate the underlying python binary used by that venv and see if it segfaults on its own. Then potentially clear caches if that Python venv is very old (older than libc upgrades, etc.).

jsirois avatar Nov 28 '24 02:11 jsirois

Sorry for the trouble.

A few questions:

  1. As @jsirois suggests, can you reproduce this outside of pants?
  2. Did something change recently that might've caused this to start crashing?

huonw avatar Dec 02 '24 05:12 huonw

Sorry for the late reply, had a lot of fires to put out the past two weeks.

  1. I have not been able to reproduce this outside of pants.
  2. Nothing changed recently as far as I'm aware. We are still using the same CI image and the pants version did not change. It actually just kinda started overnight without any code changes needed.

StealthBadger747 avatar Dec 11 '24 01:12 StealthBadger747

@StealthBadger747 hopefully my highlighting of -11 means SEGFAULT helps here. Almost surely this means a venv created by Pex (For Pants these are in ~/.cache/pants/named_caches/...) has a Python executable that segfaults. By "reproduce outside of Pants", I mean try to run that Python and see if it segfaults. You'll get a segfault, when, for example, glibc is upgraded and there are old venvs lying around with Pythons that link to older glibc. You might just try moving aside ~/.cache/pants/named_caches/ as a quick way to check if the problem goes away.

jsirois avatar Dec 11 '24 01:12 jsirois

I'm setting up a self-hosted runners and seeing something similar, though no segfault (no stderr at all) :(

Quickly dumping some context: pantsbuild 2.25 Using Action Runners Controller, running on EKS Not reproducible locally

I assumed the environments were the same, but somehow the workflow files need quite a few changes to run on Self-hosted vs github-hosted.

  • Python 3.12 was available on hosted runners, but not available on the self-hosted runners using the default image
  • setup-python@v4 does not work with pantsbuild for some reason, I was getting "/home/runner/_work/_tool/Python/3.12.9/x64/bin/python3.12: error while loading shared libraries: libpython3.12.so.1.0: cannot open shared object file: No such file or directory"
  • Installing with got pants package to at least start, but then fails with the status 1, see output below:
00:48:00.97 [INFO] Starting: Building local_dists.pex
00:48:00.97 [INFO] Starting: Building dockerfile_parser.pex from resource://pants.backend.docker.subsystems/dockerfile.lock
00:48:01.06 [INFO] Canceled: Building dockerfile_parser.pex from resource://pants.backend.docker.subsystems/dockerfile.lock
00:48:01.06 [INFO] Starting: Building dockerfile_parser.pex from resource://pants.backend.docker.subsystems/dockerfile.lock
00:48:02.[28](https://github.com/XXXXXXXXX:29) [INFO] Completed: Building dockerfile_parser.pex from resource://pants.backend.docker.subsystems/dockerfile.lock
Error: 2.30 [ERROR] 1 Exception encountered:

Engine traceback:
  in root
    ..
  in pants.core.goals.package.package_asset
    `package` goal
< Removed traceback >
pants.engine.process.ProcessExecutionFailure: Process 'Building dockerfile_parser.pex from resource://pants.backend.docker.subsystems/dockerfile.lock' failed with exit code 1.
stdout:

stderr:
received exit code 1 during execution of `['/usr/bin/python3.12', '-s', '-E', '-m', 'venv', '/tmp/pants-sandbox-bp8tux/.tmp/tmp8d52mbig/pip']` while trying to execute `['/usr/bin/python3.12', '-s', '-E', '-m', 'venv', '/tmp/pants-sandbox-bp8tux/.tmp/tmp8d52mbig/pip']`



Use `--keep-sandboxes=on_failure` to preserve the process chroot for inspection.

andrewkho avatar Apr 09 '25 01:04 andrewkho

EDIT: I completely missed this section in troubleshooting, which was the cause of this issue for me: https://www.pantsbuild.org/dev/docs/using-pants/troubleshooting-common-issues#using-pants-on-self-hosted-github-actions-runner

For anyone else who stumbles upon this, the workaround using Action Runner Controller (arc) was not super straightforward: I needed to update my k8s spec in the following way:

    initContainers:
    - name: update-perms
      image: ghcr.io/actions/actions-runner:latest
      command:
        - sh
        - -c
        - >
          sudo chown -R runner:runner /opt/hostedtoolcache
      volumeMounts:
        - name: hostedtoolcache
          mountPath: /opt/hostedtoolcache
...
    containers:
    - name: runner
      image: ghcr.io/actions/actions-runner:latest
      command: ["/home/runner/run.sh"]
      env:
        - name: DOCKER_HOST
          value: unix:///var/run/docker.sock
        - name: RUNNER_TOOL_CACHE
          value: /opt/hostedtoolcache
      volumeMounts:
        - name: hostedtoolcache
          mountPath: /opt/hostedtoolcache
...
 volumes:
    - name: hostedtoolcache
      emptyDir: {}
    - name: dind-sock
      emptyDir: {}
    - name: dind-externals
      emptyDir: {}

andrewkho avatar Apr 09 '25 01:04 andrewkho