rules_python icon indicating copy to clipboard operation
rules_python copied to clipboard

using all_requirements causes OSError: [Errno 7] Argument list too long

Open anuragkanungo opened this issue 3 years ago • 2 comments
trafficstars

🐞 bug report

Affected Rule

The issue is caused by the rule: all_requirements

Is this a regression?

Yes, the previous version in which this bug was not present was: 0.9.0 The bug exists in : 0.10.2 and 0.11.0

Description

When using all_requirements in a rule like

py_binary(
    name = "mybinary",
    srcs = ["main.py"],
    main = "main.py",
    deps = all_requirements,
)

It cause the error

Traceback (most recent call last):
  File "/home/user/.cache/bazel/_bazel_user/bde0c2a7e98d3dd9f99d720839ba72c4/execroot/org/bazel-out/k8-fastbuild/bin/third_party/ipython/ipython", line 392, in <module>
    Main()
  File "/home/user/.cache/bazel/_bazel_user/bde0c2a7e98d3dd9f99d720839ba72c4/execroot/org/bazel-out/k8-fastbuild/bin/third_party/ipython/ipython", line 382, in Main
    os.execv(args[0], args)
OSError: [Errno 7] Argument list too long: '/home/user/.cache/bazel/_bazel_user/bde0c2a7e98d3dd9f99d720839ba72c4/execroot/org/bazel-out/k8-fastbuild/bin/third_party/ipython/ipython.runfiles/python_x86_64-unknown-linux-gnu/bin/python3'

🌍 Your Environment

Operating System: Ubuntu 20.04 Output of bazel version: 5.2.0

anuragkanungo avatar Aug 16 '22 07:08 anuragkanungo

This is an issue with the upstream native bazel rules not rules_python. The problem is that the PYTHONPATH set by the python_stub_template is longer than the allowed system limits. You can try to set --noexperimental_python_import_all_repositories to avoid adding PYTHONPATH entries for all external repositories, but you may encounter issues with rules_docker, rules_k8s and others that rely on implicit repository imports. Another possible path forward would be to use https://github.com/aspect-build/rules_py which uses a site-packages based import system for python binaries IIRC.

hrfuller avatar Aug 19 '22 22:08 hrfuller

@hrfuller, the issue happens doesn't exist with rules_python version 0.9.0 but it does after that, with the same list of packages. I believe something did change in the refactor that happened after 0.9.0.

anuragkanungo avatar Aug 19 '22 22:08 anuragkanungo

Our team can confirm this behavior/bug in our environment, although we have yet to make an easily reproduceable and shareable example. rules_python version 0.10.0 throws the error while 0.9.0 does not. As pointed out, the issue isn't with one's python interpreter per say - despite the verbiage of the error - but rather the PYTHONPATH created (python_imports we believe on line ~294 of the generated script). We are not totally clear exactly what's going on, but it feels like both parties are correct here: something did change in rules python, but the thing actually throwing the error is from upstream native bazel.

Additionally, the situation appears even more complicated and subtle as it depends on the host machine configuration (aka some VMs throw the error for some targets while others do not). We think this is related to perhaps different ulimits/max arg limits across machines with different memory amounts and settings. Having said that, our very large CI machines do exhibit the issue while some random developer VMs appear to not, so it's not strictly a "sizing" issue. If it were possible to simply update an OS setting without recompiling the kernel we would be happy with that as a reasonable solution.

Finally, for some background, our team does not use the all_requirements helper, but rather the error comes about naturally from our own unique codebase. We have about 450 external dependencies and maybe 100-200 first party apps/libraries in a monorepo. A representative failing target show roughly 25 first party --output=package dependencies and 200 or so third party external dependencies. These total to a roughly 14,000 character python_imports line which we think is what's causing the issue.

We'd love to help however possible with a fix. There are exciting features after 0.9.0 that we're eager to bring in to our codebase. We really appreciate the all the hard work from and for the community😄

el-gee avatar Sep 23 '22 19:09 el-gee

Some further updates.

  • my previous post was incorrect wrt the versions, 0.10.0 has the issue, 0.9.0 does not. Post edited for clarity
  • the offending commit seems to be https://github.com/bazelbuild/rules_python/commit/09457c7a3efc939a30ecd50b438c52354cd0588e . This was verified by running our tests against each commit in the 0.10.0 release and seeing when things started to fail.
  • still no easily shareable & reproduceable case sadly

We'll try to dig in more as to exactly what specifically could have caused this from the commit.

el-gee avatar Sep 26 '22 21:09 el-gee

Just for context, all_requirements is considered a bit of an anti-pattern. Is there a way for you to explicitly list your dependencies instead? Does the same error occur?

groodt avatar Sep 26 '22 21:09 groodt

@groodt I just read your comment https://github.com/bazelbuild/rules_python/pull/715#issuecomment-1152152023 and it seems very similar to our situation, except you get lucky and go just under the limit. As I mentioned in my ugly wall of text 🙃 we do not use the all_requirements helper but rather at least attempt to explicitly list our dependencies (understand there are others in the thread too however).

Really appreciate your time and efforts on this all!

el-gee avatar Sep 26 '22 21:09 el-gee

@groodt , on the all_requirements question, we use it to provide ipython or jupyter tooling for ML experimentation such that engineers don't have to predefine all the packages needed for experimentation, makes it very useful.

anuragkanungo avatar Sep 26 '22 21:09 anuragkanungo

Also, using the flag --noexperimental_python_import_all_repositories does fixes it, though I am not sure of associated side effects of that.

anuragkanungo avatar Sep 26 '22 21:09 anuragkanungo

@anuragkanungo Yes, it's still an antipattern though. You can build tooling outside the rules to enumerate things explicitly if you need to.

groodt avatar Sep 27 '22 00:09 groodt

This appears to be a subtle bug arising from some latent assumptions between rules_python and the upstream bazel native python_stub_template. The change effectively doubles the length of the PYTHONPATH , which is known to be an issue; see https://github.com/bazelbuild/bazel/blob/9d0163002ca63f4cdbaff9420380d72e2a6e38b3/src/main/java/com/google/devtools/build/lib/bazel/rules/python/python_stub_template.txt#L424 .

This occurs, afaict, b/c the template's Deduplicate function merely puts everything into a set. See: https://github.com/bazelbuild/bazel/blob/9d0163002ca63f4cdbaff9420380d72e2a6e38b3/src/main/java/com/google/devtools/build/lib/bazel/rules/python/python_stub_template.txt#L232

But this is no longer valid as the modules from GetRepositoriesImports do not match what rules_python passes in; the natively found modules do not have site-packages suffixed. See: https://github.com/bazelbuild/bazel/blob/9d0163002ca63f4cdbaff9420380d72e2a6e38b3/src/main/java/com/google/devtools/build/lib/bazel/rules/python/python_stub_template.txt#L192

Thus the PYTHONPATH will have effectively 2 versions of the same package: /path/to/foo and /path/to/foo/site-packages . Only one of which is valid in the sense of providing working python imports.

For many users, this may not matter, but it seems to be an unintended side effect and at worst could be viewed as an incompatibility between rules_python and upstream bazel, if we understand things correctly. If this summary is accurate, we're not clear exactly what the best next steps would be.

We've found roughly 4 variables one can tweak to try and sneak under the limit if one finds themselves in the situation.

  • first and foremost, limiting the the number of deps, as each dep gets it’s own entry in the python path
  • limiting the the prefix for the external deps via pip_parses name, as each dep will be prefix’d with it
  • limiting the name of your target, as each dep will also basically be prefixed with that too in the pathing
  • limiting your username / workspace root depth

We haven't been able to make an easy test for this yet, but I imagine if one did the opposite of the suggestions (go larger for each rather than smaller) on a moderately sized package they could reproduce the error.

el-gee avatar Sep 30 '22 15:09 el-gee

Just to make sure I understand the situation: The basic problem is that, as a side-effect of using pip_parse (or pip_install?), many repos are created, and they end up on PYTHONPATH, and make it too long. These repos are just internal, implementation details. They don't need to be added to PYTHONPATH.

Is the above correct?

Also, using the flag --noexperimental_python_import_all_repositories does fixes it, though I am not sure of associated side effects of that.

The basic side effect is that every immediate-sub directory under the runfiles root will no longer be added to PYTHONPATH. see here

The basic implication here is that Python libraries that don't include their repo name in their import statements will break. Python libraries that do include their repo name in their import statements will be OK.

edit: I should clarify this is the case ignoring when something else adding things to PYTHONPATH, such as e.g. py_library(imports=...)

i.e.,:

# WORKSPACE:
http_archive(name=other_libs_repo, github.com/other-libs-project)
# Source layout 
github.com/other-libs-project
  other_libs/
    __init__.py
    other.py
http_archive(name=more_repo, github.com/more-project)
# source layout
github.com/more-project
  __init__.py
  morelib.py

workspace(name=myproject)
# Source layout
mylib1/
  __init__.py
  mylib1a.py
mylib2/
  __init__.py
  mylib2a.py

# -> results in runfiles layout
foo.runfiles/
  myproject/
    mylib1/
      __init__.py
      mylib1a.py
    mylib2/
     __init__.py
     mylib2a.py
  other_libs_repo/
    other_libs/
      __init__.py
      other.py
  more_repo/
    __init__.py
    morelib.py
# note I've omitted generated init files, which are enabled by default.

Before, "import mylib1" and "import other_libs" would work. After, they will no longer work. Whether this matters or not depends on what the code in a repo expects -- "import more_repo.morelib" works in both cases, for example.

Technically, you can work around this by setting e.g. py_binary(imports=<workspacename>). Each string in the imports attribute is treated as a repo-runfiles-root-relative path see here.

But, understandably, this is not an appealing option if you have a lot of targets. It's also problematic if you have a dependency you don't control.

I have some ideas that might help solve this, but gtg now.

We haven't been able to make an easy test for this yet, but I imagine if one did the opposite of the suggestions (go larger for each rather than smaller) on a moderately sized package they could reproduce the error.

The execve manpage says that the env limits are based on the stack size ulimit. So you can you probably do e.g. ulimit -s <smaller> to lower the limit to more easily reproduce. (When testing in your own shell, I highly recommend running ulimit and your test command in a sub-shell to avoid messing up you your own shell's settings)

We've found roughly 4 variables one can tweak to try and sneak under the limit if one finds themselves in the situation.

Is there something we can change in the rules_python bzl code to also shorten these paths as a mitigation measure that would help?

but it seems to be an unintended side effect and at worst could be viewed as an incompatibility between rules_python and upstream bazel

The real root causes all lie in native.py_*. The repo name should never have been so tightly coupled to the Python import name, the repo roots should never have been always added to PYTHONPATH, and PYTHONPATH should never have had the closure of dependencies appended into it.

As someone said upthread, the real fix for this is a site-packages like layout (or equiv) so that PYTHONPATH and sys.path don't have to scale with the dependency closure size. I've been talking to the Bazel team internally making the point that to do this efficiently, Bazel core needs some sort of feature to allow better control over where things are put within runfiles.

(This might be a bit of cold-comfort insofar as rules_python is concerned, but I still think it's worth calling out to get fixed; I brought this up internally this week, actually)

Ultimately, I think you want to enable that flag to remove the repo roots from path. The question then becomes how to get any imports that rely on the repo name working again. I have a couple ideas that we might be able to do on the rules_python Starlark side, but gtg right now.

rickeylev avatar Oct 01 '22 19:10 rickeylev

This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days. Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!

github-actions[bot] avatar Mar 31 '23 22:03 github-actions[bot]

This issue was automatically closed because it went 30 days without a reply since it was labeled "Can Close?"

github-actions[bot] avatar May 01 '23 22:05 github-actions[bot]