protobuf
protobuf copied to clipboard
Python: Stop using global static state in C extension
What language does this apply to?
Python
Describe the problem you are trying to solve.
Would like to be able to use the Python extension in subinterpreters or after reload.
Because of the assignment at https://github.com/protocolbuffers/protobuf/blob/6e349120845fa2454fa16de904a1637030b211d4/python/google/protobuf/pyext/message.cc#L3053 to the C-static variable, the https://github.com/protocolbuffers/protobuf/blob/6e349120845fa2454fa16de904a1637030b211d4/python/google/protobuf/pyext/message.cc#L216-L217 error is triggered in subinterpreters or when reloading the module. This is because the message class may not be the same in a subinterpreter/reload as it was when the module was first loaded.
Describe the solution you'd like
Options:
- See PEP 3121. Start using module state instead of C-static state so it can be safely reloaded.
- Stop checking for module class extension using static state
Please try version 4.21.x -- the C extension is completely rewritten and does not use global static state anymore. The new code is here: https://github.com/protocolbuffers/upb/tree/main/python
I expect the new version should be friendly to sub-interpreters.
The code you are looking at is the old code, which will not be used in future versions of protobuf.
Great! I will try that and report back. I fear that grpcio
was requiring the older major version.
@cretz where you able to confirm this works for you?
I am afraid not because Python gRPC does not support 4.x sadly. I am closing this issue however as it's unreasonable to expect this to apply to the 3.x versions. If I do continue to have issues with import caching on 4.x if/when I can ever get there, I will make a new issue.
Reopening. @haberman et al - I have upgraded to 4.x and am still experiencing global state and/or import reload issues.
If I use an existing google.protobuf
import but reload something that imports it (e.g. blow away it's entry in sys.modules
and re-import it), I get:
def BuildMessageAndEnumDescriptors(file_des, module):
"""Builds message and enum descriptors.
Args:
file_des: FileDescriptor of the .proto file
module: Generated _pb2 module
"""
def BuildNestedDescriptors(msg_des, prefix):
for (name, nested_msg) in msg_des.nested_types_by_name.items():
module_name = prefix + name.upper()
module[module_name] = nested_msg
BuildNestedDescriptors(nested_msg, module_name + '_')
for enum_des in msg_des.enum_types:
module[prefix + enum_des.name.upper()] = enum_des
> for (name, msg_des) in file_des.message_types_by_name.items():
E AttributeError: 'NoneType' object has no attribute 'message_types_by_name'
.venv\lib\site-packages\google\protobuf\internal\builder.py:64: AttributeError
This is because the generated code of _descriptor_pool.Default().AddSerializedFile(
is returning None
for the second invocation. I can't tell where in the C code it bails on the second invocation (I was hoping I didn't have to debug that far).
If I also reload the google.protobuf
module, then I get:
.venv\lib\site-packages\google\protobuf\empty_pb2.py:19: in <module>
_builder.BuildTopDescriptorsAndMessages(DESCRIPTOR, 'google.protobuf.empty_pb2', globals())
.venv\lib\site-packages\google\protobuf\internal\builder.py:108: in BuildTopDescriptorsAndMessages
module[name] = BuildMessage(msg_des)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
msg_des = <google._upb._message.Descriptor object at 0x000002C192466820>
def BuildMessage(msg_des):
create_dict = {}
for (name, nested_msg) in msg_des.nested_types_by_name.items():
create_dict[name] = BuildMessage(nested_msg)
create_dict['DESCRIPTOR'] = msg_des
create_dict['__module__'] = module_name
> message_class = _reflection.GeneratedProtocolMessageType(
msg_des.name, (_message.Message,), create_dict)
E TypeError: A Message class can only inherit from Message, not (<class 'google.protobuf.message.Message'>,)
.venv\lib\site-packages\google\protobuf\internal\builder.py:85: TypeError
This appears to be because the C code checks against PythonMessage_class
which is a static value set inside the module loading.
Of course everything works fine if I set the PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION
env var to python
.
The first error ("AttributeError: 'NoneType' object has no attribute 'message_types_by_name'") is the bug described in https://github.com/protocolbuffers/protobuf/issues/10075. A fix for this has been committed and will be released imminently.
For the second error, your link points to code that is no longer used in 4.x. The 4.x code is here: https://github.com/protocolbuffers/upb/blob/41335a03becd3df1371de035dcbc787c4ce49ae3/python/message.c#L1772-L1780
The new code stores the class in module state, so it's not immediately obvious what the problem is here.
A fix for this has been committed and will be released imminently.
Awesome, thanks
no longer used in 4.x
Ah thanks! I followed from the top of message_factory.py
and may have confused myself. It appears that PyUpb_ModuleState_Get is referencing a static module def instead of PyUpb_ModuleState_GetFromModule
but I admit I am not familiar with how that works. But from the static ref, one would expect PyUpb_ModuleState_Get
will return the same value each invocation, regardless of whether the module was reloaded.
Looking at the functions at https://docs.python.org/3/c-api/module.html#module-lookup, I think that the interpreter keeps a PyModuleDef *def
-> PyObject* module
map that stores the current module for a given ModuleDef.
Perhaps we need to be calling PyState_RemoveModule()
in PyUpb_ModuleDealloc()
, so that the next PyState_AddModule()
can put the new module in the map.
Can you add a unit test for this in https://github.com/protocolbuffers/upb/tree/main/python and see if adding PyState_RemoveModule()
makes the unit test pass?
I am not sure PyState_RemoveModule()
would help. In theory it is possible, via subinterpreters and the like, to have the same module created separately and both still be active. So there would be no dealloc.
Maybe the code can be changed from checking that a protobuf class extends a specific type created the original module to whether the protobuf class extends that type by name. Technically there can be many google.protobuf.message.Message
in existence in a single process but the C code expects only one.
This isn't urgent for me because the soon-to-be-released #10075 at least lets me re-import.
@haberman Understood the issue is reported and fixed in the new version. Any ideas what version historically does not have this problem working with python3.6, it seems I cannot find a good combination.
Unfortunately I think there might not be any combination that will work for you. Python 3.6 has been EOL for almost a year, we don't support it anymore, and the older versions that do support it don't have this fix.
We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.
This issue is labeled inactive
because the last activity was over 90 days ago.
We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please reopen it.
This issue was closed and archived because there has been no new activity in the 14 days since the inactive
label was added.