rules_python
rules_python copied to clipboard
ignore .pyc files in py_library() targets created in pip_install_dependencies()
.pyc files are not hermetic and ruin hashes for remote-caching. Observed with pypi__pip and pypi__setuptools
PR Checklist
Please check if your PR fulfills the following requirements:
- [ ] Tests for the changes have been added (for bug fixes / features)
- [ ] Docs have been added / updated (for bug fixes / features)
PR Type
What kind of change does this PR introduce?
- [ X ] Bugfix
What is the current behavior?
Issue Number: N/A
The py_library()
ies created by pip_install_requirements()
have checksums for .pyc
files included. These files don't match from one install to the next, and because they are build inputs they guarantee a remote-cache
miss.
I noticed this because the requirements_test
target that's set up by compile_pip_requirements()
is never a cache hit in my CI pipeline.
Running bazel with --execution_log_json_file=
allowed me to debug. When run twice on the same machine with a remote-cache, even with a bazel clean
in between still yields a cache hit.
But when running bazel on 2 different executors (docker containers), then diffing the runs yields only file hash changes:
--- exec1.json 2022-09-12 15:29:38.000000000 +0900
+++ exec1-1.json 2022-09-12 15:34:03.000000000 +0900
@@ -2343,21 +2343,21 @@
}, {
"path": "external/pypi__pip/pip/_internal/utils/__pycache__/compat.cpython-310.pyc",
"digest": {
- "hash": "72f8ee456dd18c7b6b370e7705c1f4a254b56e9291182cf40445e2be6fbd0ef5",
+ "hash": "6691abe643cd9e5aae626749720fa8c49d2b3bb3ed3e653e3a90378336b096ad",
"sizeBytes": "1534",
"hashFunctionName": "SHA-256"
}
}, {
"path": "external/pypi__pip/pip/_internal/utils/__pycache__/compat.cpython-39.pyc",
"digest": {
- "hash": "d1933621a8710b6b017e6b8c6516e81bf5c99a5f22e2aaac45161f13e5b9a501",
+ "hash": "0078e3ce171c2162f656db8fe2b40bcfc27ae6d076f6194366ae85b027fd1603",
"sizeBytes": "1535",
"hashFunctionName": "SHA-256"
}
}, {
"path": "external/pypi__pip/pip/_internal/utils/__pycache__/compatibility_tags.cpython-310.pyc",
"digest": {
- "hash": "6b7d2b0c22492c289a93fbc34e11fea1f4c8941916b314074e515af2f0063ace",
+ "hash": "7854acaf76bb430d0e57dcc878824114b5fcf1f44735c129bdcfc85298d82b0a",
"sizeBytes": "4103",
"hashFunctionName": "SHA-256"
}
...
It seems to only include hashes from pypi__pip
and pypi__setuptools
.
What is the new behavior?
With my patch applied, the remote-cache is a hit on differing CI executors when previously it was not.
Does this PR introduce a breaking change?
- [ ] Yes
- [ X ] No
Other information
I'm nearly certain these wheels don't ship with bytecode. I think the byte code is getting generated when we use these external repos during repository rule execution. Can you try adding -B to this args list
This is related to https://github.com/bazelbuild/rules_python/pull/749. Please, see the related PRs in that description.
Please, reopen if running as non-root doesn't solve your problem.
I'm not running bazel
as root, but I'll double back and debug this again
I verified I'm not running this as root
or with any god-like container powers. It's a builder
user id 1337 in circleci's x86 container deployment.
WRT https://github.com/bazelbuild/rules_python/pull/749 the chown on lib
in the hermetic python install isn't going to affect the wheel install directory under pypi__setuptools/
... so I don't think it's related?
@hrfuller passing -B
does fix this for me. I copied the "ignore *.pyc" pattern from https://github.com/bazelbuild/rules_python/blob/main/python/pip_install/extract_wheels/bazel.py#L140 , but I could do a PR to whack this mole in the -B
way if it's preferred.
@f0rmiga I don't know how to "re-open" this. Do I submit another PR, file an issue, both?
I reopened. You can just update this PR so we keep the history.
Looking at the -B
solution again, it works for my circleci use case where the test is run with a fresh cache and nothing is run before bazel test
, but upon further inspection it doesn't work for subsequent runs of the same target.
This is because the first run builds the .pyc
files, and then they're subsequently included in the dep.
No remote cache, I rm -rf
'd the local cache dir. A bazel query with the -B
change applied yields:
laz@builder-git_bando ~/git/bando$ bazel query 'deps(@pypi__pip//:lib)' | grep pyc
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
But if I run a target that has it listed as a dependency, the .pyc
files get created. The file glob picks them up the next time:
laz@builder-git_bando ~/git/bando$ bazel test //build:requirements_test
(23:53:46) INFO: Current date is 2022-10-12
(23:53:58) INFO: Analyzed target //build:requirements_test (67 packages loaded, 4020 targets configured).
(23:53:58) INFO: Found 1 test target...
Target //build:requirements_test up-to-date:
bazel-bin/build/requirements_test
(23:56:33) INFO: Elapsed time: 167.735s, Critical Path: 154.48s
(23:56:33) INFO: 5 processes: 3 internal, 2 processwrapper-sandbox.
(23:56:33) INFO: Build completed successfully, 5 total actions
//build:requirements_test PASSED in 153.6s
(23:56:33) INFO: Build completed successfully, 5 total actions
laz@builder-git_bando ~/git/bando$ bazel query 'deps(@pypi__pip//:lib)' | grep pyc
Loading: 0 packages loaded
@pypi__pip//:pip/__pycache__/__init__.cpython-310.pyc
@pypi__pip//:pip/__pycache__/__main__.cpython-310.pyc
@pypi__pip//:pip/_internal/__pycache__/__init__.cpython-310.pyc
@pypi__pip//:pip/_internal/__pycache__/build_env.cpython-310.pyc
@pypi__pip//:pip/_internal/__pycache__/cache.cpython-310.pyc
@pypi__pip//:pip/_internal/__pycache__/configuration.cpython-310.pyc
@pypi__pip//:pip/_internal/__pycache__/exceptions.cpython-310.pyc
@pypi__pip//:pip/_internal/__pycache__/pyproject.cpython-310.pyc
@pypi__pip//:pip/_internal/__pycache__/self_outdated_check.cpython-310.pyc
@pypi__pip//:pip/_internal/__pycache__/wheel_builder.cpython-310.pyc
@pypi__pip//:pip/_internal/cli/__pycache__/__init__.cpython-310.pyc
@pypi__pip//:pip/_internal/cli/__pycache__/autocompletion.cpython-310.pyc
@pypi__pip//:pip/_internal/cli/__pycache__/base_command.cpython-310.pyc
So for local caching, adding -B
breaks test execution caching between executions 1 and 2, but the .pyc
files don't change so the cache will be a hit on subsequent runs.
For remote caching, -B
fixes the first invocation of the test b/c the .pyc
files aren't included as deps just after the wheel install, but it will break the cache on subsequent runs because they get deposited on the FS when code is run.
Unless you would prefer doing readonly chowns on every wheel install, like we do with the hermetic python, I think ignoring the .pyc
files in the glob is the proper fix and this PR stands as-is?
I don't think making these .pyc
files read-only makes sense as it does with the hermetic toolchain. Probably ignoring them from the glob is better in this case.
@mattem what do you think?
@lazcamus could you resolve the conflicts, please? I'll go ahead and merge this.
the conflicts are from it being fixed by https://github.com/bazelbuild/rules_python/pull/881 ... guess I should have added more comments ;)
closing this PR :)