protobuf icon indicating copy to clipboard operation
protobuf copied to clipboard

Python generated code is no longer compatible with Cython since 3.20

Open vthib opened this issue 2 years ago • 4 comments

Version: 4.21.8 Language: Python

Before the 3.20 update, the generated code was statically defining all the messages, and could be given to cython without issues. Since the 3.20 update, those messages are dynamically generated, leading to a cython error.

For example, given this proto file

syntax = "proto3";

message Foo {
    bool a = 1;
}

The codegen gives this:

$ protoc --python_out=gen proto/a.proto

# -*- coding: utf-8 -*-
# Generated by the protocol buffer compiler.  DO NOT EDIT!
# source: proto/a.proto
"""Generated protocol buffer code."""
from google.protobuf.internal import builder as _builder
from google.protobuf import descriptor as _descriptor
from google.protobuf import descriptor_pool as _descriptor_pool
from google.protobuf import symbol_database as _symbol_database
# @@protoc_insertion_point(imports)

_sym_db = _symbol_database.Default()




DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(b'\n\rproto/a.proto\"\x10\n\x03\x46oo\x12\t\n\x01\x61\x18\x01 \x01(\x08\x62\x06proto3')

_builder.BuildMessageAndEnumDescriptors(DESCRIPTOR, globals())
_builder.BuildTopDescriptorsAndMessages(DESCRIPTOR, 'proto.a_pb2', globals())
if _descriptor._USE_C_DESCRIPTORS == False:

  DESCRIPTOR._options = None
  _FOO._serialized_start=17
  _FOO._serialized_end=33
# @@protoc_insertion_point(module_scope)

This can no longer be compiled by cython:

$ cython gen/proto/a_pb2.py
Error compiling Cython file:
------------------------------------------------------------
...
_builder.BuildMessageAndEnumDescriptors(DESCRIPTOR, globals())
_builder.BuildTopDescriptorsAndMessages(DESCRIPTOR, 'proto.a_pb2', globals())
if _descriptor._USE_C_DESCRIPTORS == False:

  DESCRIPTOR._options = None
  _FOO._serialized_start=17
 ^
------------------------------------------------------------

gen/proto/a_pb2.py:23:2: undeclared name not builtin: _FOO

Cython does not dynamically check variables from modifications of globals(), and thinks the variable is not set.

This isn't really a regression since I suppose you do not support Cython. However, a small change could make this work with native python and cython:

Instead of generating this:

_FOO._serialized_start=17

generating this would fix the issue:

globals()["_FOO"]._serialized_start=17

This is not an easy fix to do with some post processing of the generated files with regexes and seds and stuff like this, but shouldn't be too hard to do in the code generator I suppose.

Would you be OK with such a change? That would be really useful for cython users. I can try to make this change if needed.

vthib avatar Oct 20 '22 13:10 vthib

What is the use case for this? What is the expected behavior if you let Cython compile the generated code?

haberman avatar Oct 20 '22 16:10 haberman

The expected behavior is to have the exact same behavior as without the Cython pass. Cython is usually used to improve performances, but it can also be used for obfuscation and protecting the source code.

vthib avatar Oct 20 '22 16:10 vthib

I see. That seems reasonable. We reference globals() several times, I could see rewriting this to:

_globals = globals()
_builder.BuildMessageAndEnumDescriptors(DESCRIPTOR, _globals)
_builder.BuildTopDescriptorsAndMessages(DESCRIPTOR, 'proto.a_pb2', _globals)

if _descriptor._USE_C_DESCRIPTORS == False:

  DESCRIPTOR._options = None
  _globals["_FOO"]._serialized_start=17

We could use _globals for any case where we are referencing a variable we did not directly assign.

I would worry some about the performance, except that _descriptor._USE_C_DESCRIPTORS will always be true unless we are using the pure-Python library, which is slow anyway. So I think we are ok there.

haberman avatar Oct 20 '22 16:10 haberman

Yes that would work well!

vthib avatar Oct 21 '22 08:10 vthib

Are you working on this @haberman ? I can try to make a PR for it otherwise

vthib avatar Nov 14 '22 08:11 vthib

I am not working on this currently. PRs welcome.

haberman avatar Nov 16 '22 16:11 haberman