rules_python icon indicating copy to clipboard operation
rules_python copied to clipboard

TFX namespace packages are not installed properly

Open zyxue opened this issue 2 years ago • 1 comments

🐞 bug report

Affected Rule

requirement

Is this a regression?

No

Description

I'm trying to install the https://github.com/tensorflow/tfx package properly, which has two namespace packages on PyPI:

  • one is tfx, which has a comprehensive list of dependencies;
  • the other one is ml-pipelines-sdk, a leaner version with fewer dependencies.

Note, the first version (tfx) depends on the seconds one (ml-pipelines-sdk). The relevant parts in the setup.py is at https://github.com/tensorflow/tfx/blob/4e08d44b4a99bfc93a6cc94b2b9c242c94b2ec8e/setup.py#L284-L297.

I am more interested in the first one.

The entry on my requirement file is like

cat requirements.in 
tfx==1.6.0

When installing tfx, it's supposed to be installed in just one tfx folder. But with rules_python, tfx appears in two places, e.g.

ls /path/to/bazel-bin/myapp/debug_python.runfiles/vendor_python_ml_pipelines_sdk/tfx
__init__.py  dependencies.py  dsl  examples  experimental  orchestration  proto  py.typed  types  utils  version.py

ls /path/to/bazel-bin/myapp/debug_python.runfiles/vendor_python_tfx/tfx             
__init__.py  components  dependencies.py  dsl  examples  experimental  extensions  orchestration  proto  py.typed  scripts  tools  types  utils  v1  version.py

Since vendor_python_ml_pipelines_sdk/tfx is before vendor_python_tfx/tfx in sys.path, so import tfx also import the first one, and results in error.

When I installed tfx without bazel with python -m pip install tfx, there is just

ls /path/to/venv/lib/python3.8/site-packages/tfx                             
__init__.py  components  dependencies.py  dsl  examples  experimental  extensions  orchestration  proto  py.typed  scripts  tools  types  utils  v1  version.py

My understanding is that the contents of ml-pipelines-sdk and tfx should be merged into a single tfx folder when installing tfx because they are both under the namespace tfx. But rules_python may not have handled such case properly, so two tfx folders appear and cause confusion during import.

🔬 Minimal Reproduction

try

pip-compile --allow-unsafe --annotation-style=line  --output-file=generated_requirements.txt requirements.in

with the single-entry requirements.in shown above and then put generated_requirements.txt in WORKSPACE, e.g.

load("@rules_python//python:pip.bzl", "pip_install", "pip_parse")

pip_parse(
    name = "vendor_python",
    timeout = 1200,
    quiet = False,
    requirements_lock = "//path/to:generated_requirements.txt",
)

load("@vendor_python//:requirements.bzl", "install_deps")
install_deps()

🔥 Exception or Error

from tfx.components.trainer import executor doesn't work. Note tfx.components package is only included in the comprehensive version (tfx), NOT in the lean version (ml-pipelines-sdk) (See the difference between ls outputs above).

🌍 Your Environment

Operating System:

macos

Output of bazel version:

bazel 4.2.1

Rules_python version:

Tried both 0.4.0 and 0.6.0, neither works.

zyxue avatar Feb 19 '22 06:02 zyxue

@hrfuller is this related to something you experienced at Twitter? We use Tensorflow at my company but not TFX, and so wouldn't have seen this.

thundergolfer avatar Feb 22 '22 03:02 thundergolfer

This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days. Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!

github-actions[bot] avatar Dec 11 '22 22:12 github-actions[bot]

This issue was automatically closed because it went 30 days without a reply since it was labeled "Can Close?"

github-actions[bot] avatar Jan 10 '23 22:01 github-actions[bot]