pants
pants copied to clipboard
Remote-apis testing: pants is failing since 2.16
Describe the bug Since 2.16 (commit 27fc9ee7761e61f3c5c9b502d612df5f1f13e29b in https://github.com/pantsbuild/example-python), pants appears to be behave incorrectly when used for remote execution
See https://gitlab.com/remote-apis-testing/remote-apis-testing/-/merge_requests/362
Pants version Issue is present from 2.16 to current one, 2.19 (commit f37c500e4f4e0c67e29aa9434b1b414f333bdd79 in https://github.com/pantsbuild/example-python)
OS Linux
Additional info We have upgraded to latest (2.19) in https://remote-apis-testing.gitlab.io/remote-apis-testing/ to show current status for now, please let us know if this is a known issue and we will upgrade as soon as a new tag is available
Thanks for flagging this.
Is there a summary of how we can run the remote-apis-testing
tests locally, to reproduce this? I see that https://gitlab.com/remote-apis-testing/remote-apis-testing/-/blob/master/CONTRIBUTING.md seems to be focused on augmenting the CI pipeline, although peeking at that pipeline does suggest that maybe docker-compose/run.sh
is what we should be starting with?
Is there a summary of how we can run the
remote-apis-testing
tests locally, to reproduce this? I see that https://gitlab.com/remote-apis-testing/remote-apis-testing/-/blob/master/CONTRIBUTING.md seems to be focused on augmenting the CI pipeline, although peeking at that pipeline does suggest that maybedocker-compose/run.sh
is what we should be starting with?
In order to run a test with pants locally you can cd docker-compose
and run ./run.sh -g -c pants.yml -s <server>.yml
(the server options being buildbarn, buildfarm and buildgrid). After the first run it is no longer necessary to specify the -g
option, as this is used to generate the docker compose yaml files. You can make changes to the generated files if necessary and then re-use the same command above without that flag to test the changes.
I hope this helps!
The logs of one of the test runs report that the RE server fails to find the Python3.9 executable in /root/.cache/nce/60b51…8fda/…
.
I see a similar error when testing locally with Nativelink as the underlying REAPI server. It looks like Pants submits an absolute path to the client's home directory to the remote execution server. The server is unable to find that path locally, because it runs on a different machine. The log below shows server output in my test. Pants is run as user michael, but there is no user michael in the executor container.
nativelink_executor-1 | 2024-05-03T04:41:00.194254Z ERROR nativelink_worker::local_worker: Error executing action, err: Error { code: NotFound, messages: ["No such file or directory (os error 2)", "Could not execute command [\"/home/michael/.cache/nce/fa6ec1ff473e58cf7dff9577ae94c2bde6bf1c7a837c75b928b414c0195eb80e/bindings/venvs/2.20.0/bin/python3.9\", \"./pex\", \"--tmpdir\", \".tmp\", \"--no-emit-warnings\", \"--pip-version\", \"23.1.2\", \"--python-path\", \"\", \"--output-file\", \"local_dists.pex\", \"--intransitive\", \"--interpreter-constraint\", \"CPython==3.11.*\", \"--sources-directory=source_files\", \"--no-pypi\", \"--index=https://pypi.org/simple/\", \"--manylinux\", \"manylinux2014\", \"--resolver-version\", \"pip-2020-resolver\", \"--layout\", \"zipapp\"]"] }
I assume something similar is happening in the remote-apis-testing repository, except that the mismatching contents of /root/.cache
isn't apparent, because both the local and remote environments are supposedly run as root.
Ah okay, that's a handy smoking gun. Thank you!
It looks like the process invocation is referencing the absolute path to the ~/.cache/nce/.../python3.9
binary that scie-pants provides (the "bootstrap python"). I thus have a suspicion that this might've been caused by #18433 (cherry-picked back to 2.16 in #18495) which switched us to running PEX with that Python interpreter, rather than one more "normally" managed.
@thejcannon (as author) @stuhood (as reviewer): do you have any insight into how we might fix this remote execution issue?
The get_python_for_scripts
should be computing the path of the unpacked digest in the remote environment (and not the local path). Either that's wrong or remote execution is incorrectly having their code return the local path.
Thanks for the pointer! The rule you mentioned clearly distinguishes between remote and local environments: https://github.com/pantsbuild/pants/blob/a40f2fdd52542274b716ad7829b597303e030ff6/src/python/pants/core/util_rules/adhoc_binaries.py#L50-L57
When get_python_for_scripts
requests _PythonBuildStandaloneBinary
in line 55, I expect the engine to run the download_python_binary
rule:
https://github.com/pantsbuild/pants/blob/a40f2fdd52542274b716ad7829b597303e030ff6/src/python/pants/core/util_rules/adhoc_binaries.py#L60-L68
The Python binary download should emit a log message, but running pants with -ltrace
doesn't seem to emit that log message. Consequently, it looks that the EnvironmentTarget
passed to get_python_for_scripts
or the corresponding if clause has an issue.
Here's a reproducer. Start an REAPI server in one terminal:
git clone https://github.com/TraceMachina/nativelink.git
cd nativelink/deployment-examples/docker-compose
docker compose up -d --build
docker compose logs -f
Start Pants in another terminal:
cat << EOF >pants.toml
[GLOBAL]
pants_version = "2.20.0"
backend_packages = [
"pants.backend.python",
]
remote_execution = true
remote_store_address = "grpc://127.0.0.1:50051"
remote_execution_address = "grpc://127.0.0.1:50052"
remote_instance_name = "main"
process_execution_remote_parallelism = 1
[python]
interpreter_constraints = ["==3.11.*"]
EOF
cat << EOF >app.py
print("hello")
EOF
cat << EOF >BUILD
python_sources()
EOF
pants -ltrace --no-pantsd --no-local-cache run app.py