protobuf icon indicating copy to clipboard operation
protobuf copied to clipboard

`py_proto_library` `_virtual_imports` causing import name collisions

Open dougthor42 opened this issue 1 year ago • 2 comments

:exclamation: :exclamation: :exclamation: Migrated from https://github.com/bazelbuild/rules_python/issues/2133 :exclamation: :exclamation: :exclamation:

🐞 bug report

Affected Rule

py_proto_library

Is this a regression?

I haven't investigated yet, sorry :face_with_diagonal_mouth:

Description

It appears that the modifications to PYTHONPATH (added when py_proto_library uses a proto_library that has strip_import_prefix) can cause multiple paths to be included that have the same python namespaces defined.

For example: all of these paths have a doug python package in them:

bazel-bin/src/doug/foo_bin.runfiles/_main/src
bazel-bin/src/doug/foo_bin.runfiles/_main/src/doug/proto2/_virtual_imports/a_proto
bazel-bin/src/doug/foo_bin.runfiles/_main/src/doug/proto2/_virtual_imports/bar_proto
$ tree -L 3 bazel-bin/src/doug/foo_bin.runfiles/_main/src
bazel-bin/src/doug/foo_bin.runfiles/_main/src
├── doug
│   ├── foo_bin -> /usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/dbe74c4144b5c9a438d84a119652bef9/execroot/_main/bazel-out/k8-fastbuild/bin/src/doug/foo_bin
│   ├── foo.py -> /usr/local/google/home/dthor/dev/pyle/src/doug/foo.py
│   ├── __init__.py
│   └── proto2
│       ├── __init__.py
│       └── _virtual_imports
└── __init__.py
$ tree -L 3 bazel-bin/src/doug/foo_bin.runfiles/_main/src/doug/proto2/_virtual_imports/a_proto/
bazel-bin/src/doug/foo_bin.runfiles/_main/src/doug/proto2/_virtual_imports/a_proto/
├── doug
│   ├── __init__.py
│   ├── proto2
│   │   ├── a_pb2.py -> /usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/dbe74c4144b5c9a438d84a119652bef9/execroot/_main/bazel-out/k8-fastbuild/bin/src/doug/proto2/_virtual_imports/a_proto/doug/proto2/a_pb2.py
│   │   ├── __init__.py
│   │   └── __pycache__
│   └── __pycache__
│       └── __init__.cpython-311.pyc
└── __init__.py
$ tree -L 3 bazel-bin/src/doug/foo_bin.runfiles/_main/src/doug/proto2/_virtual_imports/bar_proto/
bazel-bin/src/doug/foo_bin.runfiles/_main/src/doug/proto2/_virtual_imports/bar_proto/
├── doug
│   ├── __init__.py
│   └── proto2
│       ├── bar_pb2.py -> /usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/dbe74c4144b5c9a438d84a119652bef9/execroot/_main/bazel-out/k8-fastbuild/bin/src/doug/proto2/_virtual_imports/bar_proto/doug/proto2/bar_pb2.py
│       └── __init__.py
└── __init__.py

This causes any python code that runs import doug (or import doug.x.y.z) to look in the _virtual_imports directory because it shows up later in sys.path:

doug module:
bazel-bin/src/doug/foo_bin.runfiles/_main/src/doug/proto2/_virtual_imports/bar_proto/doug/__init__.py
bar_pb2 module:
bazel-bin/src/doug/foo_bin.runfiles/_main/src/doug/proto2/_virtual_imports/bar_proto/doug/proto2/bar_pb2.py

Main question: Am I doing something wrong or is this actually (un)intended behavior?

🔬 Minimal Reproduction

Bear with me, as there's a fair bit of setup.

We use proto_library.strip_import_prefix because the some protos import others, and those proto import statements do not include src directory, as you'll see below.

Once you get the project structure in place, run:

bazel run //src/doug:foo_bin

This will print out some debugging info, namely sys.path and the path for the imported doug package and foo_pb2 module.

Actual Behavior

You'll see that the doug package will be loaded from foo_bin.runfiles/_main/src/doug/proto2/_virtual_imports/bar_proto/doug/__init__.py.

Expected Behavior

The doug package should be loaded from foo_bin.runfiles/_main/src/doug/__init__.py.

Investigation

Showing correct import path

Edit ./src/doug/BUILD.bazel to remove the dependency on //src/doug/proto2:bar_pb2 and run bazel run //src/doug:foo_bin again.

You'll see that the doug package is correctly found at foo_bin.runfiles/_main/src/doug/__init__.py (yes, the python code will then throw ImportError about foo_pb2.py, but that's expected).

Modifying proto imports fixes things

Revert those changes. Now edit ./src/doug/proto2/BUILD.bazel and remove the strip_import_prefix lines. Run foo_bin again and you'll see an error related to building the protos because the proto import path is incorrect.

With strip_import_prefixes removed, edit ./src/doug/proto2/bar.proto so that the import statement includes src/ and run foo_bin again.

-import "doug/proto2/a.proto";  // IMPORTANT
+import "/src/doug/proto2/a.proto";  // IMPORTANT

Things will work and you'll notice that the bar_pb2 is being loaded from foo_bin.runfiles/_main/src/doug/proto2/bar_pb2.py - not the _virtual_imports directory.

Sadly this "modify proto imports" fix is not an option for us - we can't update our proto imports without significant work.

Project structure and Files

.
├── src/
│   ├── BUILD.bazel
│   └── doug/
│       ├── BUILD.bazel
│       ├── foo.py
│       ├── __init__.py
│       └── proto2
│           ├── a.proto
│           ├── bar.proto
│           └── BUILD.bazel
├── BUILD.bazel
└── MODULE.bazel

./MODULE.bazel and ./BUILD.bazel have typical things you'd see for a python+proto project. We're using rules_python 0.33.1, protobuf 21.7, and rules_proto 5.3.0-21.7.

# src/BUILD.bazel
# gazelle:python_root
# src/doug/foo.py
# Just debugging things.
import sys

for p in sorted(sys.path):
    print(p)

print("*" * 50)
import doug
print("doug module:")
print(doug.__file__)

import doug.proto2.bar_pb2 as bar_pb2
print("bar_pb2 module:")
print(bar_pb2.__file__)
# src/doug/BUILD.bazel
load("@rules_python//python:defs.bzl", "py_binary", "py_library")
py_library(
    name = "foo",
    srcs = ["foo.py"],
    imports = [".."],  # because of `python_root`
    visibility = ["//visibility:public"],
    deps = [
        "//src/doug/proto2:bar_pb2",
    ],
)

py_binary(
    name = "foo_bin",
    main = "foo.py",
    srcs = ["foo.py"],
    imports = [".."],
    deps = [
        ":foo",
    ],
)

The contents of the proto files don't matter too much, except that one proto file must reference the other via an import:

// src/doug/proto2/bar.proto
syntax = "proto2";
import "google/protobuf/timestamp.proto";
import "doug/proto2/a.proto";  // IMPORTANT
package doug.proto2;

message Label {
    optional string key = 1;
    optional string value = 2;
    optional google.protobuf.Timestamp timestamp = 3;
}
// src/doug/proto2/a.proto
syntax = "proto2";
package doug.proto2;
# src/doug/proto2/BUILD.bazel
load("@protobuf//bazel:proto_library.bzl", "proto_library")
load("@protobuf//bazel:py_proto_library.bzl", "py_proto_library")

proto_library(
    name = "a_proto",
    srcs = ["a.proto"],
    strip_import_prefix = "/src",
    visibility = ["//visibility:public"],
    deps = ["@protobuf//:descriptor_proto"],
)

proto_library(
    name = "bar_proto",
    srcs = ["bar.proto"],
    strip_import_prefix = "/src",
    visibility = ["//visibility:public"],
    deps = [
        "@protobuf//:timestamp_proto",
        ":a_proto",
    ],
)

py_proto_library(
    name = "bar_pb2",
    visibility = ["//visibility:public"],
    deps = [":bar_proto"],
)

🔥 Exception or Error

In "Minimal Reproduction", but the gist is ModuleNotFoundError.

🌍 Your Environment

Operating System:

gLinux

Output of bazel version:

$ bazel version
Bazelisk version: v1.20.0
Build label: 7.3.2
Build target: @@//src/main/java/com/google/devtools/build/lib/bazel:BazelServer
Build time: Tue Oct 1 17:46:05 2024 (1727804765)
Build timestamp: 1727804765
Build timestamp as int: 1727804765

Rules_python version:

0.36.0

Anything else relevant?

  • protobuf 28.2

dougthor42 avatar Aug 20 '24 23:08 dougthor42

Hi @dougthor42 -

Is this issue reproducible with the latest release? Protobuf Python version 21.7 is no longer supported. You can find the support timelines for more recent versions of Protobuf Python at https://protobuf.dev/support/version-support/#python

JasonLunn avatar Oct 04 '24 21:10 JasonLunn

Thanks @JasonLunn. I didn't realize I was using such an old version, haha.

This is reproducible with the latest release 28.2.

I've updated the original description:

  • use @protobuf instead of @rules_proto
  • use @protobuf instead of @com_google_protobuf
  • Update versions
  • fix proto comments from # to //

dougthor42 avatar Oct 18 '24 20:10 dougthor42

I'm hitting the same issue, it also appears to be a long-standing issue, having found another report of this issue in grpc repo: https://github.com/grpc/grpc/issues/24024

I've hit this issue when trying to add py_proto_library targets for the corresponding cc_proto_library targets for the foxglove_schemas_protobuf module: https://github.com/bazelbuild/bazel-central-registry/blob/main/modules/foxglove_schemas_protobuf/0.2.1.bcr.1/overlay/BUILD.bazel

Is there some sort of workaround? Like some way to make a py_library that would coalesce the py-proto libraries?

mikael-s-persson avatar Jan 08 '25 21:01 mikael-s-persson

FYI, I just added a test case in the examples folder to reproduce the issue: https://github.com/protocolbuffers/protobuf/commit/10737eadc42094627db0f30461687538646980fa

(branch: https://github.com/mikael-s-persson/protobuf/tree/fix/multi_proto_py_lib)

Hopefully, that makes it easy to debug or test against. I haven't found an appropriate test folder other than "examples" to put the repro in, but I'm happy to move it.

mikael-s-persson avatar Jan 08 '25 23:01 mikael-s-persson

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago. This issue will be closed and archived after 14 additional days without activity.

github-actions[bot] avatar Apr 10 '25 10:04 github-actions[bot]

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please reopen it.

This issue was closed and archived because there has been no new activity in the 14 days since the inactive label was added.

github-actions[bot] avatar Apr 25 '25 10:04 github-actions[bot]