rules_python should offer an aspect detecting pip hub clashes
🚀 feature request
Relevant Rules
hubs created by pip.parse.
Description
One shall not use multiple hubs in the same dependency tree as otherwise potentially one can have one Python module multiple times with different versions in the dependency tree.
While this is easy to follow in small projects, ensuring this never happens by accident in large projects is a challenge. Especially, if the different hubs come to a target as non obvious transitive dependencies.
Since, the order of dependencies affects which version of a duplicated module is found first, this can cause surprising behavior in a number of scenarios.
Describe the solution you'd like
I believe one can detect clashing hubs with a Bazel aspect. I did a naive implementation locally and it worked well for my toy examples.
It would be great if rules_python would offer an aspect for detecting hub clashes.
In my naive implementation I relied on the naming scheme pip.parse uses to create the workspaces for the individual Python modules contained in a hub.
While it is easy enough to unit test this works as expected given a certain Bazel and rules_python version, such code relying on rules_python implementation details is better suited to live in rules_python itself.
An alternative could be an aspect ensuring no Python module offered by a hub exists twice in the dependency tree. This would allow combining hubs as long as no errornous dependency duplication is introduced.
Describe alternatives you've considered
I am not aware of other measures to ensure hubs never clash.
I am not sure if I am +1 on this because sometimes you may want to have 2 hubs where you are migrating from one package version to another and you are mixing the hubs on purpose.
What is more, it can sometimes be desired to override deps subtree from one hub with deps subtree from another.
I would like to get feedback from other people who might benefit from this before discussing the how we can do this part.
sometimes you may want to have 2 hubs where you are migrating from one package version to another and you are mixing the hubs on purpose.
To prevent a potential misunderstanding. I want to give people a tool to detect and prevent pip hub clashes. But I imagine it as an opt-in thingy, not something rules_python enforces automatically for all users.
What is more, it can sometimes be desired to override deps subtree from one hub with deps subtree from another.
Does a public example exist where one can see how this is properly done?
Ultimately, we want to avoid this problem entirely by letting a binary specify which hub it wants to use, and letting dependencies point to a more "abstract" reference for something so they auto-resolve to the "current" hub.
As of 1.7, this is now possible, but requires some additional work and isn't as seamless as I'd like. Ah, I thought I wrote some docs for how to do this. This doc is the next best starting point: https://rules-python.readthedocs.io/en/latest/howto/multi-platform-pypi-deps.html#how-to-multi-platform-pypi-dependencies
The couple bits of info it's missing are:
- Call add_transition_setting to register a custom flag (e.g. a hub_name flag) (https://rules-python.readthedocs.io/en/latest/api/rules_python/python/extensions/config.html#config.add_transition_setting)
- Use py_binary.config_settings to set that flag (https://rules-python.readthedocs.io/en/latest/api/rules_python/python/private/py_binary_rule.html#py_binary.config_settings)
- In the dependency (the py_library), write a select on the flag to pick the appropriate hub
Somewhat tedious to do right now, but gives you a way for preventing the problem entirely.
All that said, until we get that missing seamless part implemented, inadvertently mixing hubs is still a headache. I'm +1 on an aspect to help identify when this occurs (reminds me of the "find py2-only things in a py3 binary aspect from years ago, which was somewhat helpful), because it turn into a very confusing headache when causes problems.