rules_python icon indicating copy to clipboard operation
rules_python copied to clipboard

ignore .pyc files in py_library() targets created in pip_install_dependencies()

Open lazcamus opened this issue 2 years ago • 1 comments

.pyc files are not hermetic and ruin hashes for remote-caching. Observed with pypi__pip and pypi__setuptools

PR Checklist

Please check if your PR fulfills the following requirements:

  • [ ] Tests for the changes have been added (for bug fixes / features)
  • [ ] Docs have been added / updated (for bug fixes / features)

PR Type

What kind of change does this PR introduce?

  • [ X ] Bugfix

What is the current behavior?

Issue Number: N/A

The py_library()ies created by pip_install_requirements() have checksums for .pyc files included. These files don't match from one install to the next, and because they are build inputs they guarantee a remote-cache miss.

I noticed this because the requirements_test target that's set up by compile_pip_requirements() is never a cache hit in my CI pipeline.

Running bazel with --execution_log_json_file= allowed me to debug. When run twice on the same machine with a remote-cache, even with a bazel clean in between still yields a cache hit.

But when running bazel on 2 different executors (docker containers), then diffing the runs yields only file hash changes:

--- exec1.json  2022-09-12 15:29:38.000000000 +0900
+++ exec1-1.json        2022-09-12 15:34:03.000000000 +0900
@@ -2343,21 +2343,21 @@
   }, {
     "path": "external/pypi__pip/pip/_internal/utils/__pycache__/compat.cpython-310.pyc",
     "digest": {
-      "hash": "72f8ee456dd18c7b6b370e7705c1f4a254b56e9291182cf40445e2be6fbd0ef5",
+      "hash": "6691abe643cd9e5aae626749720fa8c49d2b3bb3ed3e653e3a90378336b096ad",
       "sizeBytes": "1534",
       "hashFunctionName": "SHA-256"
     }
   }, {
     "path": "external/pypi__pip/pip/_internal/utils/__pycache__/compat.cpython-39.pyc",
     "digest": {
-      "hash": "d1933621a8710b6b017e6b8c6516e81bf5c99a5f22e2aaac45161f13e5b9a501",
+      "hash": "0078e3ce171c2162f656db8fe2b40bcfc27ae6d076f6194366ae85b027fd1603",
       "sizeBytes": "1535",
       "hashFunctionName": "SHA-256"
     }
   }, {
     "path": "external/pypi__pip/pip/_internal/utils/__pycache__/compatibility_tags.cpython-310.pyc",
     "digest": {
-      "hash": "6b7d2b0c22492c289a93fbc34e11fea1f4c8941916b314074e515af2f0063ace",
+      "hash": "7854acaf76bb430d0e57dcc878824114b5fcf1f44735c129bdcfc85298d82b0a",
       "sizeBytes": "4103",
       "hashFunctionName": "SHA-256"
     }
...

It seems to only include hashes from pypi__pip and pypi__setuptools.

What is the new behavior?

With my patch applied, the remote-cache is a hit on differing CI executors when previously it was not.

Does this PR introduce a breaking change?

  • [ ] Yes
  • [ X ] No

Other information

lazcamus avatar Sep 12 '22 07:09 lazcamus

I'm nearly certain these wheels don't ship with bytecode. I think the byte code is getting generated when we use these external repos during repository rule execution. Can you try adding -B to this args list

hrfuller avatar Sep 20 '22 03:09 hrfuller

This is related to https://github.com/bazelbuild/rules_python/pull/749. Please, see the related PRs in that description.

f0rmiga avatar Oct 10 '22 01:10 f0rmiga

Please, reopen if running as non-root doesn't solve your problem.

f0rmiga avatar Oct 10 '22 02:10 f0rmiga

I'm not running bazel as root, but I'll double back and debug this again

lazcamus avatar Oct 12 '22 07:10 lazcamus

I verified I'm not running this as root or with any god-like container powers. It's a builder user id 1337 in circleci's x86 container deployment.

WRT https://github.com/bazelbuild/rules_python/pull/749 the chown on lib in the hermetic python install isn't going to affect the wheel install directory under pypi__setuptools/ ... so I don't think it's related?

@hrfuller passing -B does fix this for me. I copied the "ignore *.pyc" pattern from https://github.com/bazelbuild/rules_python/blob/main/python/pip_install/extract_wheels/bazel.py#L140 , but I could do a PR to whack this mole in the -B way if it's preferred.

@f0rmiga I don't know how to "re-open" this. Do I submit another PR, file an issue, both?

lazcamus avatar Oct 12 '22 13:10 lazcamus

I reopened. You can just update this PR so we keep the history.

f0rmiga avatar Oct 12 '22 16:10 f0rmiga

Looking at the -B solution again, it works for my circleci use case where the test is run with a fresh cache and nothing is run before bazel test, but upon further inspection it doesn't work for subsequent runs of the same target.

This is because the first run builds the .pyc files, and then they're subsequently included in the dep.

No remote cache, I rm -rf'd the local cache dir. A bazel query with the -B change applied yields:

laz@builder-git_bando ~/git/bando$ bazel query 'deps(@pypi__pip//:lib)' | grep pyc
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded

But if I run a target that has it listed as a dependency, the .pyc files get created. The file glob picks them up the next time:

laz@builder-git_bando ~/git/bando$ bazel test //build:requirements_test
(23:53:46) INFO: Current date is 2022-10-12
(23:53:58) INFO: Analyzed target //build:requirements_test (67 packages loaded, 4020 targets configured).
(23:53:58) INFO: Found 1 test target...
Target //build:requirements_test up-to-date:
  bazel-bin/build/requirements_test
(23:56:33) INFO: Elapsed time: 167.735s, Critical Path: 154.48s
(23:56:33) INFO: 5 processes: 3 internal, 2 processwrapper-sandbox.
(23:56:33) INFO: Build completed successfully, 5 total actions
//build:requirements_test                                                PASSED in 153.6s

(23:56:33) INFO: Build completed successfully, 5 total actions
laz@builder-git_bando ~/git/bando$ bazel query 'deps(@pypi__pip//:lib)' | grep pyc
Loading: 0 packages loaded
@pypi__pip//:pip/__pycache__/__init__.cpython-310.pyc
@pypi__pip//:pip/__pycache__/__main__.cpython-310.pyc
@pypi__pip//:pip/_internal/__pycache__/__init__.cpython-310.pyc
@pypi__pip//:pip/_internal/__pycache__/build_env.cpython-310.pyc
@pypi__pip//:pip/_internal/__pycache__/cache.cpython-310.pyc
@pypi__pip//:pip/_internal/__pycache__/configuration.cpython-310.pyc
@pypi__pip//:pip/_internal/__pycache__/exceptions.cpython-310.pyc
@pypi__pip//:pip/_internal/__pycache__/pyproject.cpython-310.pyc
@pypi__pip//:pip/_internal/__pycache__/self_outdated_check.cpython-310.pyc
@pypi__pip//:pip/_internal/__pycache__/wheel_builder.cpython-310.pyc
@pypi__pip//:pip/_internal/cli/__pycache__/__init__.cpython-310.pyc
@pypi__pip//:pip/_internal/cli/__pycache__/autocompletion.cpython-310.pyc
@pypi__pip//:pip/_internal/cli/__pycache__/base_command.cpython-310.pyc

So for local caching, adding -B breaks test execution caching between executions 1 and 2, but the .pyc files don't change so the cache will be a hit on subsequent runs.

For remote caching, -B fixes the first invocation of the test b/c the .pyc files aren't included as deps just after the wheel install, but it will break the cache on subsequent runs because they get deposited on the FS when code is run.

Unless you would prefer doing readonly chowns on every wheel install, like we do with the hermetic python, I think ignoring the .pyc files in the glob is the proper fix and this PR stands as-is?

lazcamus avatar Oct 13 '22 00:10 lazcamus

I don't think making these .pyc files read-only makes sense as it does with the hermetic toolchain. Probably ignoring them from the glob is better in this case.

@mattem what do you think?

f0rmiga avatar Nov 02 '22 18:11 f0rmiga

@lazcamus could you resolve the conflicts, please? I'll go ahead and merge this.

f0rmiga avatar Dec 01 '22 19:12 f0rmiga

the conflicts are from it being fixed by https://github.com/bazelbuild/rules_python/pull/881 ... guess I should have added more comments ;)

closing this PR :)

lazcamus avatar Dec 06 '22 10:12 lazcamus