Cannot install Pants > 2.19 if home directory in `/etc/passwd` is a symlink.
Describe the bug
The pants CLI works fine in all of our environments at version 2.19. When we change the version to 2.20 or 2.21 or 2.22, we see one of two errors, although it's not clear why some environments trigger one and some another, given that they're all on the same Ubuntu (perhaps quirks of the user setup):
$ pants --keep-sandboxes=on_failure --changed-since=origin/main lint
Bootstrapping Pants 2.20.0
Installing pantsbuild.pants==2.20.0 into a virtual environment at /home/<REDACTED>/.cache/nce/60b<REDACTED>/bindings/venvs/2.20.0
Failed to create Pants virtual environment.
Error: Command '['/bulk_data/home/<REDACTED>/.cache/nce/60b<REDACTED>/bindings/pex_root/venvs/591<REDACTED>/561<REDACTED>/bin/python', '/tmp/tmpp276ax9e.pex', 'venv', '--prompt', 'Pants
2.20.0', '--compile', '--pip', '--collisions-ok', '--no-emit-warnings', '--disable-cache', '/home/<REDACTED>/.cache/nce/60b<REDACTED>/bindings/venvs/2.20.0']' returned non-zero exit status 1., output:
-----
b'Traceback (most recent call last):\n File "/bulk_data/home/<REDACTED>/.cache/nce/679<REDACTED>/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/runpy.py", line 197, in _run_module_as_main\n re
turn _run_code(code, main_globals, None,\n File "/bulk_data/home/<REDACTED>/.cache/nce/679<REDACTED>/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/runpy.py", line 87, in _run_code\n exec(code
, run_globals)\n File "/home/<REDACTED>/.pex/unzipped_pexes/9c1<REDACTED>/__main__.py", line 105, in <module>\n from pex.pex_bootstrapper import bootstrap_pex\nModuleNotFoundError: No module named \'pex\'\n'
-----
Or
Bootstrapping Pants 2.22.0
Installing pantsbuild.pants==2.22.0 into a virtual environment at /home/<REDACTED>/.cache/nce/60b<REDACTED>/bindings/venvs/2.22.0
Failed to fetch https://github.com/pantsbuild/pants/releases/download/release_2.22.0/pants.2.22.0-cp39-linux_x86_64.pex: [22] HTTP response code said error (The requested URL returned error: 404)
Wasn't able to fetch the Pants PEX at https://github.com/pantsbuild/pants/releases/download/release_2.22.0/pants.2.22.0-cp39-linux_x86_64.pex.
Check to see if the URL is reachable (i.e. GitHub isn't down) and if pants.2.22.0-cp39-linux_x86_64.pex asset exists within the release. If the asset doesn't exist it may be that this platform isn't yet supported. If that's the case, please reach out on Slack: https://www.pantsbuild.org/docs/getting-help#slack or file an issue on GitHub: https://github.com/pantsbuild/pants/issues/new/choose.
Exception:
Command '['/home/<REDACTED>/.cache/nce/226<REDACTED>/ptex-linux-x86_64', 'https://github.com/pantsbuild/pants/releases/download/release_2.22.0/pants.2.22.0-cp39-linux_x86_64.pex']' returned non-zero exit status 1.
Error: Failed to establish atomic directory /home/<REDACTED>/.cache/nce/60b<REDACTED>/locks/install-ab9<REDACTED>. Population of work directory failed: Boot binding command failed: exit status: 1
Isolates your Pants from the elements.
Please select from the following boot commands:
<default> (when SCIE_BOOT is not set in the environment) Detects the current Pants installation and launches it.
bootstrap-tools Introspection tools for the Pants bootstrap process.
update Update scie-pants.
You can select a boot command by setting the SCIE_BOOT environment variable.
Pants version 2.19 - 2.22
OS Linux:
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"
PRETTY_NAME="Ubuntu 22.04.3 LTS"
Additional info We've tried a bit of flailing to fix this:
- Attempting different Pants versions.
./get-pants.shSCIE_BOOT=update pants(yieldsNo new releases of scie-pants were found.)
We tried to install Pex directly into the virtual environment using pip, but got
raise MetadataError(\npex.dist_metadata.MetadataError: Failed to determine project name and version for distribution at /bulk_data/home/<REDACTED>/.pex/unzipped_pexes/ba7<REDACTED>/.deps/PyYAML-6.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.\n'
FWIW, https://github.com/pantsbuild/pants/releases/download/release_2.22.0/pants.2.22.0-cp39-linux_x86_64.pex is not an accessible address as far as I can tell.
The second error is because 2.22 has not been released yet, that is https://github.com/pantsbuild/pants/releases/tag/release_2.22.0 will also 404. (There have been several RCs and we hope it is really close, but it is not out yet!)
I am unsure about the first error. When you say "all of our environments", do you mean that it happens in a variety of environments ( every developers workstation) or in a standardized environment (that is on with your /bulk_data` mount).
I hesitate to recommend this, but is deleting the cache (.cache/nce) among the things you have tried?
Ah, yes, that might have been a miscommunication. Forget about 2.22...
The main thing is: Tried tearing down .cache/nce, actually all of .cache and .pex and /tmp/*pants*, the whole box, but no go. 2.19.0 installs fine, 2.20 and 2.21 fail, always with the same ModuleNotFoundError.
Just to be sure it wasn't some quirk of config in our repo, we even cleared everything out including any Pants binary and did the following:
curl --proto '=https' --tlsv1.2 -fsSL https://static.pantsbuild.org/setup/get-pants.sh | bash
mkdir test-pants
cd test-pants
echo '[GLOBAL]' > pants.toml
echo 'pants_version = "2.21.0"' >> pants.toml
pants repl .
And still see the same error on these boxes.
Any ideas how we could get this to spit out some more useful logs about what's happening perhaps?
Can you conform what version of the scie-pants bootloader this box has?
$ PANTS_BOOTSTRAP_VERSION=report pants
Any ideas how we could get this to spit out some more useful logs about what's happening perhaps?
I am not sure they would have much more that the earlier output, but you can try what is in find ~/.cache/nce/ -iname '*log' (ex: install.log)
This is what we see for the bootloader:
$ PANTS_BOOTSTRAP_VERSION=report pants
0.12.0
Only an install.log (no configure.log or pants-install.log on this box like we see on a healthy system), and it just reiterates what's in the dump.
Okay, a bit more context: We were able to reproduce this on a completely clean Ubuntu image on AWS with a new Pants repo: ami-0d486650b94f4c69b. Not sure whether that's pointing to some networking configuration around it (perhaps it's failing to fetch something silently?).
We found the source of the issue. The Pants installer isn't correctly handling a symlinked home directory. In particular, in /etc/passwd, the home directory was listed as a symlink /home/<user> instead of /bulk_data/home/<user>.
That looks frustratingly deep to debug; glad you found it!
I have a similar observation / problem, but not with a symlinked home directory.
In my case, it's when ~/.cache is symlinked to a directory named .cache more than 3 levels deep in /tmp. This is the case in a number of 3rd party SDLC images such as renovate/renovate:39.62 but I can replicate this on ubuntu:22.04
- fails -
/home/ubuntu/.cacheis a symlink to/tmp/test/cache/.cache - fails -
/home/ubuntu/.cacheis a symlink to/tmp/test/something/.cache - succeeds -
/home/ubuntu/.cacheis a symlink to/tmp/test/.cache
In my case, 2.22.0 works correctly, but 2.23.0 fails (see below). No changes are required to make 2.22.0 work correctly.
I'm creating the test environment with docker run --rm --name foo --volume ./myrepo:/tmp/myrepo --entrypoint /bin/sleep ubuntu:24.04 9000
nested directory at /tmp/test/cache/.cache - fails
This set of steps fails with the failure message below for 2.23.0 and works for 2.22.0
Get shell in container with docker exec -it -u root foo /bin/bash and then
apt update && apt install -y curl
mkdir -p /tmp/test/cache/.cache
chmod 777 /tmp/test /tmp/test/cache /tmp/test/cache/.cache
if [ -L /home/ubuntu/.cache ]; then rm /home/ubuntu/.cache; fi
ln -s /tmp/test/cache/.cache /home/ubuntu/.cache
ls -ld /tmp/test /tmp/test/cache /tmp/test/cache/.cache /home/ubuntu/.cache
lrwxrwxrwx 1 root root 22 Dec 12 14:41 /home/ubuntu/.cache -> /tmp/test/cache/.cache
drwxrwxrwx 3 root root 4096 Dec 12 14:40 /tmp/test
drwxrwxrwx 3 root root 4096 Dec 12 14:40 /tmp/test/cache
drwxrwxrwx 2 root root 4096 Dec 12 14:40 /tmp/test/cache/.cache
su - ubuntu
cd /tmp/myrepo
curl --proto '=https' --tlsv1.2 -fsSL https://static.pantsbuild.org/setup/get-pants.sh | bash
/home/ubuntu/.local/bin/pants version
less nested .cache directory inside /tmp/test - works
Before running this test, terminate the container and restart it to return filesystem to default.
Get shell in container with docker exec -it -u root foo /bin/bash and then
apt update && apt install -y curl
mkdir -p /tmp/test/.cache
chmod 777 /tmp/test /tmp/test/.cache
if [ -L /home/ubuntu/.cache ]; then rm /home/ubuntu/.cache; fi
ln -s /tmp/test/.cache /home/ubuntu/.cache
ls -ld /tmp/test /tmp/test/.cache /home/ubuntu/.cache
lrwxrwxrwx 1 root root 16 Dec 12 14:56 /home/ubuntu/.cache -> /tmp/test/.cache
drwxrwxrwx 3 root root 4096 Dec 12 14:56 /tmp/test
drwxrwxrwx 2 root root 4096 Dec 12 14:56 /tmp/test/.cache
su - ubuntu
cd /tmp/myrepo
curl --proto '=https' --tlsv1.2 -fsSL https://static.pantsbuild.org/setup/get-pants.sh | bash
/home/ubuntu/.local/bin/pants version
2.23.0 failure message
Bootstrapping Pants 2.23.0
Installing pantsbuild.pants==2.23.0 into a virtual environment at /home/ubuntu/.cache/nce/9de06a50ca43f773ddf5463ea534b781a193096c5d9c3cf4fa9687d592fb0986/bindings/venvs/2.23.0
Failed to create Pants virtual environment.
Error: Command '['/tmp/containerbase/cache/.cache/nce/9de06a50ca43f773ddf5463ea534b781a193096c5d9c3cf4fa9687d592fb0986/bindings/pex_root/venvs/e9325278eb97b235cc28d540e6599e7f5e69fa25/ef0210ddc65deea0460a3aa02dbb08eab37714fc/bin/python', '/tmp/tmpsaa54k0r.pex', 'venv', '--prompt', 'Pants 2.23.0', '--compile', '--pip', '--collisions-ok', '--no-emit-warnings', '--disable-cache', '/home/ubuntu/.cache/nce/9de06a50ca43f773ddf5463ea534b781a193096c5d9c3cf4fa9687d592fb0986/bindings/venvs/2.23.0']' returned non-zero exit status 1., output:
-----
b'Traceback (most recent call last):\n File "/tmp/containerbase/cache/.cache/nce/7d19e1ecd6e582423f7c74a0c67491eaa982ce9d5c5f35f0e4289f83127abcb8/cpython-3.9.18+20240107-aarch64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/runpy.py", line 197, in _run_module_as_main\n return _run_code(code, main_globals, None,\n File "/tmp/containerbase/cache/.cache/nce/7d19e1ecd6e582423f7c74a0c67491eaa982ce9d5c5f35f0e4289f83127abcb8/cpython-3.9.18+20240107-aarch64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/runpy.py", line 87, in _run_code\n exec(code, run_globals)\n File "/home/ubuntu/.cache/pex/unzipped_pexes/5d3ad1f48b31f75a4afacd01941825831b5cd152/__main__.py", line 227, in <module>\n result, should_exit, is_globals = boot(\n File "/home/ubuntu/.cache/pex/unzipped_pexes/5d3ad1f48b31f75a4afacd01941825831b5cd152/__main__.py", line 216, in boot\n from pex.globals import Globals\nModuleNotFoundError: No module named \'pex\'\n'
-----
Error: Failed to establish atomic directory /home/ubuntu/.cache/nce/9de06a50ca43f773ddf5463ea534b781a193096c5d9c3cf4fa9687d592fb0986/locks/install-9a7e34655c6fec617f37def0a028aa5075179a3c657b2dec3e986db78c2a89a3. Population of work directory failed: Boot binding command failed: exit status: 1
2.23.0 success message
When using /tmp/test/.cache, 2.23.0 succeeds with
Bootstrapping Pants 2.23.0
Installing pantsbuild.pants==2.23.0 into a virtual environment at /home/ubuntu/.cache/nce/9de06a50ca43f773ddf5463ea534b781a193096c5d9c3cf4fa9687d592fb0986/bindings/venvs/2.23.0
New virtual environment successfully created at /home/ubuntu/.cache/nce/9de06a50ca43f773ddf5463ea534b781a193096c5d9c3cf4fa9687d592fb0986/bindings/venvs/2.23.0
14:47:19.51 [INFO] Initializing scheduler...
14:47:19.53 [INFO] Initializing Nailgun pool for 8 processes...
14:47:20.66 [INFO] Scheduler initialized.
2.23.0
14:47:24.08 [WARN] Executor shutdown took unexpectedly long: tasks were likely leaked!
Thanks for the specific repro instructions. I now reproduce this locally, so will dig in.
This is because pex computes a relative path of the "physical" cache dir /tmp/test/cache/.cache/pex/bootstraps/ff146c4e9ca2a34371658662bfd7c7714bba8c10 relative to the symlinked dir /home/ubuntu/.cache/pex/unzipped_pexes/5d3ad1f48b31f75a4afacd01941825831b5cd152. So we end up with ../../../../../../tmp/test/cache/.cache/pex/bootstraps/ff146c4e9ca2a34371658662bfd7c7714bba8c10 as the relpath (the intention was for that to be ../../bootstraps/ff146c4e9ca2a34371658662bfd7c7714bba8c10).
And so if paths under the symlink are at a different depth than the corresponding "physical" paths, that relpath is incorrect. If they are at the same depth then that ../../../../../../ prefix will climb up to the filesystem root and back down again, which is not what was intended, but will happen to work.
The underlying issue is that we os.path.realpath the pex_root in most cases, but not in at least one case (here it's the fallback value in a call to Variables.PEX_ROOT.value_or(...).
I will file and fix over in pex.
Fixed here: https://github.com/pex-tool/pex/pull/2626
https://github.com/pantsbuild/pants/pull/21762 upgrades Pants to use Pex 2.27.1, which includes this fix. This should go out in the next dev release of Pants (2.25.0.dev2).
If you need this fix in an earlier version of Pants you can manually update the Pex version in config:
[pex-cli]
version = "v2.27.1"
known_versions.add = [
"v2.27.1|macos_arm64 |013a824e5af50f9687f765a43e8ffe94b4faa4fe795d017333c687bf943a4a68|4369121",
"v2.27.1|macos_x86_64|013a824e5af50f9687f765a43e8ffe94b4faa4fe795d017333c687bf943a4a68|4369121",
"v2.27.1|linux_arm64 |013a824e5af50f9687f765a43e8ffe94b4faa4fe795d017333c687bf943a4a68|4369121",
"v2.27.1|linux_x86_64|013a824e5af50f9687f765a43e8ffe94b4faa4fe795d017333c687bf943a4a68|4369121",
]
@benjyw I also encountered this issue for a long time. Happy to see some progress and effort made here. However, I just tried the pants 2.23.0 with Pex 2.27.1. I still have the same issue. Below is my sample log.
Downloading https://github.com/pantsbuild/pants/releases/download/release_2.23.0/pants.2.23.0-cp39
Downloading https://github.com/pantsbuild/pants/releases/download/release_2.23.0/pants.2.23.0-cp39
Traceback (most recent call last):
File "/data/user/<userid>/.cache/nce/nce/f3ff38b1ccae7dcebd8bbf2e533c9a984fac881de0ffd1636fbb61842bd924de/cpython-3.9.18+20231002-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/data/user/<userid>/.cache/nce/nce/f3ff38b1ccae7dcebd8bbf2e533c9a984fac881de0ffd1636fbb61842bd924de/cpython-3.9.18+20231002-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/users/<userid>/.cache/pex/unzipped_pexes/7be46b58df48a5f579dd66d7f7ed2f32307063b5/__main__.py", line 227, in <module>
result, should_exit, is_globals = boot(
File "/users/<userid>/.cache/pex/unzipped_pexes/7be46b58df48a5f579dd66d7f7ed2f32307063b5/__main__.py", line 216, in boot
from pex.globals import Globals
ModuleNotFoundError: No module named 'pex'
when I go inside to /users/<userid>/.cache/pex/unzipped_pexes/7be46b58df48a5f579dd66d7f7ed2f32307063b5, below is what I see
[.... 7be46b58df48a5f579dd66d7f7ed2f32307063b5]$ ls -l
total 20
-rwxr-xr-x. 1 <userid> <usergroup> 7919 Dec 17 17:37 __main__.py
lrwxrwxrwx. 1 <userid> <usergroup> 104 Dec 17 17:37 __pex__ -> ../../../../../../<xyz>/home/<userid>/.cache/pex/user_code/68b87e96476955d5120f0cbfa1ef1141290ead52/__pex__
-rw-r--r--. 1 <userid> <usergroup> 3890 Dec 17 17:37 PEX-INFO
-rw-rw----. 1 <userid> <usergroup>6 Dec 17 17:37 PEX-LAYOUT
drwxrwx---. 2 <userid> <usergroup> 4096 Dec 17 17:37 __pycache__
What seems to me is that __pex__ -> ../../../../../../<xyz>/home/<userid>/.cache/pex/user_code/68b87e96476955d5120f0cbfa1ef1141290ead52/__pex__ is still broken.
In my case /users/<userid> is the same as <xyz>/home/<userid>. I guess that is why the linked was created? Do you have any clue on the issue here?
I've just realized that upgrading pex in an existing version of Pants won't help, because this relates to the version of Pex that we package Pants with.
Can you try upgrading to the just-released 2.25.0.dev2?
I've just realized that upgrading pex in an existing version of Pants won't help, because this relates to the version of Pex that we package Pants with.
Can you try upgrading to the just-released 2.25.0.dev2?
Thank you @benjyw . I can confirm that 2.25.0.dev2 solves this issue 💯 . I will wait for the official release.