protobuf
protobuf copied to clipboard
Issue with python binary wheels when several .proto files have the same name.
Metadata:
Version: protobuf v3.9.1 Language: Python Python version: 3.6.8
Main topic
As for now, if protobuf for python installed via pip install protobuf, there is an inconsistent behavior.
If you have several protobuf files with the same file name compiled to python, if you will try to import both, you'll get a following output:
TypeError: Couldn't build proto file into descriptor pool!
Invalid proto descriptor for file "same_name.proto":
same_name.proto: A file with this name is already in the pool.
Important note which distinguish this issue from similar ones opened before: Both files with the same name have different package name set inside.
How to reproduce:
# Create basic folder
mkdir protobuf_issue
cd protobuf_issue
# Create a virtualenv
python3 -m venv protobuf_issue_venv
source protobuf_issue_venv/bin/activate
# Install protobuf
pip install protobuf
# Create folders for proto files and proto files (with the same names) themselves (note that package name differs).
mkdir a_proto b_proto
echo 'syntax = "proto3";\npackage a;\nmessage A { uint64 a = 1; }\n' > a_proto/same_name.proto
echo 'syntax = "proto3";\npackage b;\nmessage B { uint64 b = 1; }\n' > b_proto/same_name.proto
# Create folders for python packages and compile protobuf
mkdir a b
touch a/__init__.py
touch b/__init__.py
protoc --proto_path=a_proto --python_out=a same_name.proto
protoc --proto_path=b_proto --python_out=b same_name.proto
# Run python
python
Then inside repl enter the following:
import a.same_name_pb2
import b.same_name_pb2
After you'll enter import b.same_name_pb2, you'll get an exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/user/protobuf_issue/b/same_name_pb2.py", line 23, in <module>
serialized_pb=_b('\n\x0fsame_name.proto\x12\x01\x62\"\x0e\n\x01\x42\x12\t\n\x01\x62\x18\x01 \x01(\x04\x62\x06proto3')
File "/home/user/protobuf_issue/protobuf_issue_venv/lib/python3.6/site-packages/google/protobuf/descriptor.py", line 879, in __new__
return _message.default_pool.AddSerializedFile(serialized_pb)
TypeError: Couldn't build proto file into descriptor pool!
Invalid proto descriptor for file "same_name.proto":
same_name.proto: A file with this name is already in the pool.
Workaround
As I found in the Internet, you can get rid of this problem by installing protobuf without binaries:
pip uninstall protobuf
pip install --no-binary=protobuf protobuf
After doing that the example above works as expected, and no errors occur.
Expected behavior
I see that documentation for protobuf python generated files says:
The Python code generated by the protocol buffer compiler is completely unaffected by the package name defined in the .proto file. Instead, Python packages are identified by directory structure.
However, in my case of usage .proto files are being received from the remote server and compiled in the runtime.
If I will have to accept those rules, I will have to parse .proto files manually to extract the package name from it, then create folder for it, then move this file into this folder. And then compiler will parse this file again, ignoring the that important piece of data. That's not very convenient.
I hope to see some protobuf flag maybe or something else to make python-generated files respect the package name in the .proto file.
Packages did not affect pure protobuf python if user get the descriptors directly by directory structure. However it has problem if user use descriptor pool to find a descriptor. The document is not up to date and we need to fix the document.
We have two versions of python protobuf. One is pure python and the other is using cpp extension. python binary wheels is using cpp extension implementation which correctly do duplicate register check for descriptor pool.
Pure python used to not check duplicate descriptor registers but we have added the duplicate check in descriptor pool a few month ago: https://github.com/protocolbuffers/protobuf/blob/66540237ca212401ec1e279224b8db40f52e4ab9/python/google/protobuf/descriptor_pool.py#L143
However looks like the duplicate register for FileDescriptor was missing in pure python.
Does that added check verifies if provided package field differ? Because that's the core point of this issue.
This issue is still there, do you guys have any solution for this?
I believe the issue is still present when extracting descriptors from a descriptor pool, any update on when a fix can be added?
Note: I cannot simply download the pure python implementation of protobuf as it will slow everything down.
We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.
This issue is labeled inactive because the last activity was over 90 days ago.
I don't think it has been fixed.