python-betterproto
python-betterproto copied to clipboard
Naming collision with nested enums
Problem
Currently these two enums in the same file will be both compiled into the same name of ContentStatus
message Content {enum Status{ ... }}
enum ContentStatus{ ... }
Currently the behaviour is to silently overwrite one of the enums, therefore making all subsequent call of the enum possibly wrong.
Workaround
I have consulted @nat-n and they say they don't see a way to work around the issue currently, except to directly modify the generated code.
The ideal solution for this would be a (breaking) change for nested enums to instead compile to nested classes in python instead of relying on naming concatenation.
What is the policy on breaking changes of this project?
I am looking into this, I note it's still a problem and was able to generate an example of it.
I might need a tl;dr on how the enum ends up getting named and which python module in the generated tree it ends up in.
If I look at how this case is handled in other libraries/languages, I see that they put the nested enum into a module with the name of its parent
So in the tree there would be:
-
__init__.py
with the messageContent
andContentStatus
-
content.py
with the enumStatus
I am currently reading through betterproto/plugin/models.py
and betterproto/plugin/parser.py
to try and understand how the decision is made to put the enum Status
in the same file as the message Content
. Cannot say I have reached comprehension as yet.
I've reached some understanding after looking at the OutputTemplate
that is created for this nested enum example.
The avenue of creating another file, while not impossible (and IMO the preferable way to do this), will probably take too much work. The suggestion by @nat-n to make a nested class is probably the way to go, and is how I am going to approach this. Will make an attempt soon.
This is much harder than I thought...
I've managed to turn this protobuf
syntax = "proto3";
package nested_enum;
message Test {
enum Inner {
NONE = 0;
THIS = 1;
}
Inner status = 1;
message Doubly {
enum Inner {
NONE = 0;
THIS = 1;
}
Inner status = 1;
}
}
message TestInner {
int32 foo = 1;
}
message TestDoublyInner {
int32 foo = 1;
string bar = 2;
}
enum Outer {
foo = 0;
bar = 1;
}
message Content {
message Status {
string code = 1;
}
Status status = 1;
}
message ContentStatus {
int32 id = 1;
}
into this code
@dataclass(eq=False, repr=False)
class Test(betterproto.Message):
@dataclass(eq=False, repr=False)
class TestDoubly(betterproto.Message):
# It would be nice if I could strip the prefix from these class names
class TestDoublyInner(betterproto.Enum):
NONE = 0
THIS = 1
# this type hint is wrong; it is referencing the global TestDoublyInner
status: "TestDoublyInner" = betterproto.enum_field(1)
class TestInner(betterproto.Enum):
NONE = 0
THIS = 1
status: "TestInner" = betterproto.enum_field(1)
@dataclass(eq=False, repr=False)
class TestInner(betterproto.Message):
foo: int = betterproto.int32_field(1)
@dataclass(eq=False, repr=False)
class TestDoublyInner(betterproto.Message):
foo: int = betterproto.int32_field(1)
bar: str = betterproto.string_field(2)
@dataclass(eq=False, repr=False)
class Content(betterproto.Message):
@dataclass(eq=False, repr=False)
class ContentStatus(betterproto.Message):
code: str = betterproto.string_field(1)
status: "ContentStatus" = betterproto.message_field(1)
@dataclass(eq=False, repr=False)
class ContentStatus(betterproto.Message):
id: int = betterproto.int32_field(1)
class Outer(betterproto.Enum):
foo = 0
bar = 1
See the inline comments for two of the problems I'm having so far.
I've parked my progress in a branch. If someone else could give it a glance that would be good, maybe I'm off-track or approaching this the wrong way.
https://github.com/danielgtaylor/python-betterproto/commit/b9ec59f2a3e95afe5163197096adb91ade7ec36b
Hey @cetanu, nice work, you're close.
In my view, the ideal approach is for the Python module/class structure to mirror that of the proto config. Unless I'm missing something following this principle should have the most elegant and least surprising or problematic result.
So just to be clear I think what we want is for the Test class to look like so:
@dataclass(eq=False, repr=False)
class Test(betterproto.Message):
@dataclass(eq=False, repr=False)
class Doubly(betterproto.Message):
class Inner(betterproto.Enum):
NONE = 0
THIS = 1
status: "Test.Doubly.Inner" = betterproto.enum_field(1) # Note the use of fully qualified class name here
class Inner(betterproto.Enum):
NONE = 0
THIS = 1
status: "Test.Inner" = betterproto.enum_field(1)
I guess you agree?
So the problem is how to avoid using concatenated class names? Do you know where the concatenation is taking place? Could it be refactored to represent class identification with an object that retains the structure so it can be formatted in a context appropriate way? (e.g. a tuple subclass, sometimes so you can use take self[0], sometimes you take ".".join(self) ).
I'm afraid making progress in this codebase requires a willingness to refactor :P
I don't have time/motivation to really dig through the code right now, but I'm happy to offer this kind of high level collaboration.
I agree completely, I just haven't quite worked out:
- where the names are concatenated
- where the type hints are also formed However what you've put as an example is the ultimate goal.