rules_python icon indicating copy to clipboard operation
rules_python copied to clipboard

perf: improve analysis performance by 95% for `py_binary` and `py_test` rules

Open tobyh-canva opened this issue 1 month ago • 11 comments

PR Status: Addressing windows compatibility issues

Context

I've recently been looking into Bazel analysis performance in my company's monorepo (predominantly Java and Typescript), and was surprised to find that rules_python accounts for almost 60% of the CPU time during analysis across our repo (433s of 741s).

From a profile, it was clear that depset.to_list() being called out of _create_zip_file() was accounting for the vast majority of the analysis time. So I focused on eliminating these depset expansions, which are expensive for analysis (anyone who's unaware, see Bazel's Optimizing Performance page). Path normalization in _get_zip_runfiles_path was also accounting for a large chunk of the analysis time.

What was slightly less clear, but eventually apparent, was that runfiles.empty_filenames being provided as an arg to the zipper action was also causing a hit to analysis performance. I traced this back to the fact that the runfiles.empty_filenames depset in this context is actually coming from the GetInitPyFiles Java class in the Python utils built into Bazel, and this class was also causing an implicit expansion of the runfiles.files depset. You can verify this by running a profile using my PR's changes and with --incompatible_default_to_explicit_init_py=false (the default).

Note that despite the Bazel-native --build_python_zip flag being disabled by default on non-Windows systems, this only affects whether the zipapp is considered a default output, i.e. the analysis cost is still incurred if this flag is disabled.

Intent

My strategy for optimizing this was basically to get rid of any depset.to_list() calls, and shift reproducible but expensive business logic (e.g. path normalization) from the analysis phase to the build phase.

Changes

  1. Introduced a wrapper for @bazel_tools//tools/zip:zipper, written in C++, which prepares the input manifest to the zipper tool, including all path rewriting
  2. Change the inputs to _create_zip_file to provide only what it needs (runfiles_without_exe) and avoid a depset expansion + filter.

I chose to write the script in C++ because it would keep the actual build action performant whilst avoiding adding new dependencies to rules_python (rules_cc is already a dependency).

Testing

I haven't added new tests specific to this wrapper or the affected functionality, instead relying on the existing //tests. Please let me know if you'd prefer specific shell tests or C++ tests.

Snapshot tests

To verify builds were the same, I wrote some snapshot tests, which are not integrated in this branch. Let me know if you'd like these to be brought in (they're platform-specific, and the snapshots may need to be updated somewhat frequently for innocuous changes).

Manual tests

I manually ran snapshot tests above on macOS, Linux, and Windows, comparing a snapshot from the main branch with this branch.

Results

Here's the --starlark_cpu_profile flame graph before/after these changes (this is profiling analysis of bazel build --nobuild //... in our monorepo.

Before

Overall the CPU time appears to have been reduced by at least 95% for py_binary and py_test targets.

The memory profile shows a similar 95% reduction in memory usage for py_binary and py_test.

  • All build outputs are identical after these changes.
  • The overall build performance is impacted by the compilation of the new C++ wrapper script.
  • The build performance of the actual PythonZipper action is effectively unchanged - the additional work done by the C++ wrapper is barely visible compared to the work which was already being done by the actual zipper.

Notes

I wasn't aware that --incompatible_default_to_explicit_init_py=true made such a significant improvement to analysis performance. Is it worth mentioning this somewhere in the docs of rules_python? It smells to me like it just disables a legacy Google-specific quirk, though I could be wrong.

I initially moved the GetInitPyFiles logic from Java into the C++ wrapper, so at least it would remove the analysis performance hit. But since it's only optimising legacy behaviour, and users can just toggle a flag to disable it, I reverted that change and left the empty __init__.py generation alone.

tobyh-canva avatar Oct 26 '25 22:10 tobyh-canva

Summary of Changes

Hello @tobyh-canva, I'm Gemini Code Assist[^1]! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the analysis performance of py_binary and py_test rules in Bazel by addressing inefficiencies related to depset.to_list() expansions and path normalization. It introduces a new C++ wrapper to handle manifest preparation for the zipper tool, moving expensive logic from the analysis phase to the build phase, resulting in substantial reductions in both CPU time and memory consumption during Bazel analysis.

Highlights

  • Performance Improvement: Achieved a 95% reduction in CPU time and memory usage for py_binary and py_test targets during Bazel analysis by optimizing how Python zip manifests are generated.
  • Elimination of depset.to_list(): Replaced expensive depset.to_list() calls in Starlark with a C++ wrapper, shifting complex processing from the analysis phase to the build phase.
  • New C++ Wrapper for Zipper: Introduced py_executable_zip_gen, a C++ binary that prepares the input manifest for the zipper tool, handling path rewriting and __init__.py generation more efficiently.
  • Streamlined Starlark Logic: Simplified the _create_zip_file function in Starlark by reducing its inputs and offloading manifest generation details to the new C++ wrapper.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with :thumbsup: and :thumbsdown: on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

[^1]: Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

gemini-code-assist[bot] avatar Oct 26 '25 22:10 gemini-code-assist[bot]

I wasn't aware that --incompatible_default_to_explicit_init_py=true made such a significant improvement to analysis performance. Is it worth mentioning this somewhere in the docs of rules_python? It smells to me like it just disables a legacy Google-specific quirk, though I could be wrong.

Yes, it's a historical thing that was difficult to remove when much of the rules_python implementation was in upstream bazelbuild/bazel. I think now that the starlarkification efforts have landed, it might be possible to flip the default. The tracking issue is here: https://github.com/bazel-contrib/rules_python/issues/2945

WDYT @rickeylev? Are we now in a position to flip this?

groodt avatar Oct 26 '25 23:10 groodt

Thanks for this! Yeah, the runfiles.files.to_list() flattening was necessary to filter out the main executable file; it predates runfiles_without_exe.

What does perf look like if L853 for artifact in runfiles.files.to_list(): ... is moved to execution phase using map_each (and the C++ tool isn't used)? By using runfiles_without_exe, that custom filtering logic doesn't need to occur, so the depset can be passed directly.

Basically, I'm wondering what part of the perf improvement is coming from avoiding the analysis time depset flattening vs c++ doing string operations instead of Starlark/Java. I'm not super keen to require on-demand building a C++ binary (and C++ is a bit of a multi-platform PITA), but if the perf gains are this significant, it's pretty compelling.

That get_zip_runfiles_path is showing up so prominently is weird/surprising. Those should be fairly simple string manipulations. But, looking at the paths.relativize implementation, it's actually doing quite a bit of work in there (normalizing both paths (which incurs splits/appends), then some zip+looping, etc). L876 can be replaced with a simple stripprefix() call. L882 can be replaced with e.g. "{workspace}/{path}" if not path.startswith("../") else path[3:]

Also, how big are the depset's you're working with? 40k? 100k? I ask because the experimental venv construction logic has to do depset flattening

rickeylev avatar Oct 27 '25 01:10 rickeylev

Hi @rickeylev :) I was also hoping I could avoid the additional C++ binary 😅

I did not think to replace the calls to path.relativize - I'll attempt your suggestions and regenerate the profile to check. Thanks! 🫡

Also, how big are the depset's you're working with? 40k? 100k? I ask because the experimental venv construction logic has to do depset flattening

I'm actually not sure, though I'd be happy to profile this experimental logic on our repo. Is there a config setting I should toggle to try it out?

tobyh-canva avatar Oct 27 '25 02:10 tobyh-canva

It is --@rules_python//python/config_settings:venv_site_packages=yes, it is going to use the codepaths introduced in #2156 to build a virtualenv just like what a venv or uv tool would.

aignas avatar Oct 27 '25 03:10 aignas

This is the profile after applying @rickeylev's suggestions - instead of using a C++ script, replacing paths.relativize and paths.normalize with simpler logic. Oddly map_zip_runfiles is still fairly significant (15% overall) on the profile. See #3381 for the actual implementation I tested this with.

profile 3

I thought map_each functions were deferred to build time, so I'm a bit confused why map_zip_runfiles is showing up in the analysis profile during a --nobuild - is this expected when we're using a closure for the map function?

tobyh-canva avatar Oct 27 '25 03:10 tobyh-canva

map_zip_runfiles takes 46 seconds. half is format(), half is startswith()

well, that's a big improvement, but yeah, that's still a lot of wasted time spent. At the least, let's merge #3381 -- it's a significant improvement without any questions about adding C++ tools. I'll go review that.

--nobuild seems to trigger map_each in analysis, but it's supposed to be deferred to execution

Yeah, that's weird. Double check your invocation? But otherwise sounds like a bazel bug. I don't see anything in your PR that would cause that.

rickeylev avatar Nov 09 '25 21:11 rickeylev

We talked about this in our maintainer meeting. We don't want to take on a built-from-source C++ dependency because of Windows -- it's already difficult to support windows without having to worry about C++ compiling. (We recently went through this with the Gazelle plugin, and it took about a week to figure out and we gave up on the gazelle release because of it).

So, options:

  1. Use this for non-windows only.
  2. Setup prebuilt binaries.

rickeylev avatar Nov 22 '25 03:11 rickeylev

Also we talked about the fact that the current algorithm is using startswith, which seems to be inefficient to do for each file. Ideas:

  1. You can do this for the first encountered file from a particular repo only.
  2. You can instead check the repo name and decide what to do.
  3. Is the external prefix that we are sniffing for disabled by some combination of bazel flags or not?

aignas avatar Nov 22 '25 05:11 aignas

  1. Setup prebuilt binaries.

I'm happy to come help if you decide to go this route. https://blog.aspect.build/releasing-bazel-rulesets-rust is the pattern I've used. We don't even bother with cross-compiles; it's simple enough to have a GHA windows machine build the windows artifacts.

alexeagle avatar Dec 05 '25 15:12 alexeagle

That'd be great! Something I've long wanted to create is a prebuilt precompiler binary using embedded python; either pyoxidizer or a regular embedded python.

rickeylev avatar Dec 10 '25 07:12 rickeylev