rules_python
rules_python copied to clipboard
TFX namespace packages are not installed properly
🐞 bug report
Affected Rule
requirement
Is this a regression?
No
Description
I'm trying to install the https://github.com/tensorflow/tfx package properly, which has two namespace packages on PyPI:
- one is tfx, which has a comprehensive list of dependencies;
- the other one is ml-pipelines-sdk, a leaner version with fewer dependencies.
Note, the first version (tfx) depends on the seconds one (ml-pipelines-sdk). The relevant parts in the setup.py is at https://github.com/tensorflow/tfx/blob/4e08d44b4a99bfc93a6cc94b2b9c242c94b2ec8e/setup.py#L284-L297.
I am more interested in the first one.
The entry on my requirement file is like
cat requirements.in
tfx==1.6.0
When installing tfx, it's supposed to be installed in just one tfx folder. But with rules_python, tfx appears in two places, e.g.
ls /path/to/bazel-bin/myapp/debug_python.runfiles/vendor_python_ml_pipelines_sdk/tfx
__init__.py dependencies.py dsl examples experimental orchestration proto py.typed types utils version.py
ls /path/to/bazel-bin/myapp/debug_python.runfiles/vendor_python_tfx/tfx
__init__.py components dependencies.py dsl examples experimental extensions orchestration proto py.typed scripts tools types utils v1 version.py
Since vendor_python_ml_pipelines_sdk/tfx is before vendor_python_tfx/tfx in sys.path, so import tfx also import the first one, and results in error.
When I installed tfx without bazel with python -m pip install tfx, there is just
ls /path/to/venv/lib/python3.8/site-packages/tfx
__init__.py components dependencies.py dsl examples experimental extensions orchestration proto py.typed scripts tools types utils v1 version.py
My understanding is that the contents of ml-pipelines-sdk and tfx should be merged into a single tfx folder when installing tfx because they are both under the namespace tfx. But rules_python may not have handled such case properly, so two tfx folders appear and cause confusion during import.
🔬 Minimal Reproduction
try
pip-compile --allow-unsafe --annotation-style=line --output-file=generated_requirements.txt requirements.in
with the single-entry requirements.in shown above and then put generated_requirements.txt in WORKSPACE, e.g.
load("@rules_python//python:pip.bzl", "pip_install", "pip_parse")
pip_parse(
name = "vendor_python",
timeout = 1200,
quiet = False,
requirements_lock = "//path/to:generated_requirements.txt",
)
load("@vendor_python//:requirements.bzl", "install_deps")
install_deps()
🔥 Exception or Error
from tfx.components.trainer import executor doesn't work. Note tfx.components package is only included in the comprehensive version (tfx), NOT in the lean version (ml-pipelines-sdk) (See the difference between ls outputs above).
🌍 Your Environment
Operating System:
macos
Output of bazel version:
bazel 4.2.1
Rules_python version:
Tried both 0.4.0 and 0.6.0, neither works.
@hrfuller is this related to something you experienced at Twitter? We use Tensorflow at my company but not TFX, and so wouldn't have seen this.
This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days. Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!
This issue was automatically closed because it went 30 days without a reply since it was labeled "Can Close?"