protobuf icon indicating copy to clipboard operation
protobuf copied to clipboard

Issue with python binary wheels when several .proto files have the same name.

Open popzxc opened this issue 6 years ago • 10 comments

Metadata:

Version: protobuf v3.9.1 Language: Python Python version: 3.6.8

Main topic

As for now, if protobuf for python installed via pip install protobuf, there is an inconsistent behavior.

If you have several protobuf files with the same file name compiled to python, if you will try to import both, you'll get a following output:

TypeError: Couldn't build proto file into descriptor pool!
Invalid proto descriptor for file "same_name.proto":
  same_name.proto: A file with this name is already in the pool.

Important note which distinguish this issue from similar ones opened before: Both files with the same name have different package name set inside.

How to reproduce:

# Create basic folder
mkdir protobuf_issue
cd protobuf_issue

# Create a virtualenv
python3 -m venv protobuf_issue_venv
source protobuf_issue_venv/bin/activate

# Install protobuf
pip install protobuf

# Create folders for proto files and proto files (with the same names) themselves (note that package name differs).
mkdir a_proto b_proto
echo 'syntax = "proto3";\npackage a;\nmessage A { uint64 a = 1; }\n' > a_proto/same_name.proto
echo 'syntax = "proto3";\npackage b;\nmessage B { uint64 b = 1; }\n' > b_proto/same_name.proto

# Create folders for python packages and compile protobuf
mkdir a b
touch a/__init__.py
touch b/__init__.py
protoc --proto_path=a_proto --python_out=a same_name.proto
protoc --proto_path=b_proto --python_out=b same_name.proto

# Run python
python

Then inside repl enter the following:

import a.same_name_pb2
import b.same_name_pb2

After you'll enter import b.same_name_pb2, you'll get an exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/protobuf_issue/b/same_name_pb2.py", line 23, in <module>
    serialized_pb=_b('\n\x0fsame_name.proto\x12\x01\x62\"\x0e\n\x01\x42\x12\t\n\x01\x62\x18\x01 \x01(\x04\x62\x06proto3')
  File "/home/user/protobuf_issue/protobuf_issue_venv/lib/python3.6/site-packages/google/protobuf/descriptor.py", line 879, in __new__
    return _message.default_pool.AddSerializedFile(serialized_pb)
TypeError: Couldn't build proto file into descriptor pool!
Invalid proto descriptor for file "same_name.proto":
  same_name.proto: A file with this name is already in the pool.

Workaround

As I found in the Internet, you can get rid of this problem by installing protobuf without binaries:

pip uninstall protobuf
pip install --no-binary=protobuf protobuf

After doing that the example above works as expected, and no errors occur.

Expected behavior

I see that documentation for protobuf python generated files says:

The Python code generated by the protocol buffer compiler is completely unaffected by the package name defined in the .proto file. Instead, Python packages are identified by directory structure.

However, in my case of usage .proto files are being received from the remote server and compiled in the runtime. If I will have to accept those rules, I will have to parse .proto files manually to extract the package name from it, then create folder for it, then move this file into this folder. And then compiler will parse this file again, ignoring the that important piece of data. That's not very convenient.

I hope to see some protobuf flag maybe or something else to make python-generated files respect the package name in the .proto file.

popzxc avatar Aug 22 '19 13:08 popzxc

Packages did not affect pure protobuf python if user get the descriptors directly by directory structure. However it has problem if user use descriptor pool to find a descriptor. The document is not up to date and we need to fix the document.

We have two versions of python protobuf. One is pure python and the other is using cpp extension. python binary wheels is using cpp extension implementation which correctly do duplicate register check for descriptor pool.

Pure python used to not check duplicate descriptor registers but we have added the duplicate check in descriptor pool a few month ago: https://github.com/protocolbuffers/protobuf/blob/66540237ca212401ec1e279224b8db40f52e4ab9/python/google/protobuf/descriptor_pool.py#L143

However looks like the duplicate register for FileDescriptor was missing in pure python.

anandolee avatar Aug 22 '19 23:08 anandolee

Does that added check verifies if provided package field differ? Because that's the core point of this issue.

popzxc avatar Sep 25 '19 06:09 popzxc

This issue is still there, do you guys have any solution for this?

ashokpant avatar Jul 14 '21 06:07 ashokpant

I believe the issue is still present when extracting descriptors from a descriptor pool, any update on when a fix can be added?

Note: I cannot simply download the pure python implementation of protobuf as it will slow everything down.

andrewwang-moveworks avatar Jul 29 '22 18:07 andrewwang-moveworks

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago.

github-actions[bot] avatar Apr 01 '24 10:04 github-actions[bot]

I don't think it has been fixed.

popzxc avatar Apr 02 '24 07:04 popzxc