zserio
zserio copied to clipboard
Emit one Python module per Zserio package
Please consider changing the python-module-per-zserio-symbol behavior. It would be really nice if I could just write
from my.package import Struct
rather than
from my.package.Struct import Struct
Aside from the PEP-8 clash, the latter just feels very redundant, and occasionally I forget that I have to import the nested Symbol and obviously get very weird errors. It would also bring the imports in Python more in line with the imports in Zserio. Thanks!
I think this issue is also relevant for other extensions (e.g. C++, see #274), but the solution might be more straightforward for Python.
Edit for visitors:
Use the generated api module as a workaround:
from my.api.package import Struct
Thank you to bring this topic up.
Your idea one Python module per Zserio package is very good and I guess that Python users will be very happy with this very intuitive solution.
We have even tried to implement this during the first Python generator design. Unfortunately, we have run to difficulties for which we were not able to find out any easy solution. It might be that we overlooked something, so it is worth to discuss.
Difficulties mean the symbol dependencies. Zserio compiler naturally allows dependencies which are defined after reference.
Example:
const uint32 A = B;
const uint32 B = 10;
struct Main
{
Item item;
};
struct Item
{
uint32 value;
};
Python does not allow such dependencies, each referenced symbol must be defined before it is used. We were not able to find out in Python something like forward declarations in C++. There are some workarounds available (like using class decorators) but this will be very difficult to implement in generated code in generic way.
We might think to restrict language not to allow such dependencies. This could be probably less pain for constants. But for normal types, this is a basic stone of the zserio language which will be required by users.
Quite often, we have a conflict between supported languages. Although, some solution makes perfect sense for one language (Python), it does not make any sense for another (Java). So if we restrict zserio compiler core to satisfy needs of one language, the users of different language will be very unpleasantly surprised.
Regarding of the problem with imports, we understand that this is a pain and error prone. Because of that each package has generated api.py module which imports all package symbols. It could be useful. Example:
import my.package.api as api
struct = api.Struct()
Thanks a lot for taking the time to write this detailed explanation @mikir ! I didn't know that zserio allows such declarations.
The best fix that I can think of would be to move the symbol imports from the api module into an __init__.py. Given your example above (let's call it package.zs):
const uint32 A = B;
const uint32 B = 10;
struct Main
{
Item item;
};
struct Item
{
uint32 value;
};
Zserio would need to generate:
package/
├── __init__.py
├── a.py
├── b.py
├── item.py
└── main.py
The contents of a.py, b.py, item.py and main.py would be the same as what is currently generated, but with lowercase filenames to establish PEP-8 compliance (relating to #285 ). The content of __init__.py would then become:
import .a, .b, .item, .main
A = a.A
B = b.B
Item = item.Item
Main = main.Main
That would solve the issue I think. Personally, I would even support prefixing the symbol-modules with an underscore to denote them as private and discourage deeply nested import statements.
Yes, I agree, __init__.py would be better. This was even implemented but then we have run to the different difficulties and we had to get rid of __init__.py.
Difficulties were called 'Circular Import Problem'. And we were not able to solve this in generic way. But again, we could forgot for some easy solution. The following explains the problem.
Consider the following python project layout:
outer
├───inner
│ └───Bar.py
│ └───Foo.py
│ └───__init__.py
└───__init__.py
Source outer/__init__.py:
import outer.inner as inner
Source outer/inner/Bar.py:
class Bar():
BAR_VALUE = 0
Source outer/inner/Foo.py:
import outer.inner.Bar
class Foo:
def __init__(self, bar: outer.inner.Bar.Bar) -> None:
self._bar = bar # type: outer.inner.Bar.Bar
Source outer/inner/__init__.py:
from outer.inner.Foo import Foo
from outer.inner.Bar import Bar
Then trying to import module outer by command import outer given the following error:
>>> import outer
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "outer/__init__.py", line 1, in <module>
import outer.inner as inner
File "outer/inner/__init__.py", line 1, in <module>
from outer.inner.Foo import Foo
File "outer/inner/Foo.py", line 3, in <module>
class Employee:
File "outer/inner/Foo.py", line 4, in Employee
def __init__(self, bar: outer.inner.Bar.Bar) -> None:
AttributeError: partially initialized module 'outer' has no attribute 'inner' (most likely due to a circular import)
This error can be solved by removing full module paths in outer/inner/Foo.py:
import outer.inner.Bar as Bar
class Foo:
def __init__(self, bar: Bar.Bar) -> None:
self._bar = bar # type: Bar.Bar
But this solution would break completely the clashing resolution in the generated Python sources.
Therefore, we have introduced api.py and we were happy that we were able to find out at least some generic solution for this problem. :-) If we overlooked something, we will be happy to know it.
One of the problems in this case is import outer.inner as inner in outer's __init__.py. It's redundant. Python automatically detects submodules.
Edit: Please see next comment.
~~Edit: And yes, the full-path imports in the nested modules too (just use from .bar import Bar).~~
~~Edit 2: I see, having the same symbol name used in two packages. But then you can still use relative imports.~~
An easy solution might be to hide the original import structure completely behind an _impl (or similar) package:
outer/
├── _impl/
│ ├── __init__.py
│ ├── bar.py
│ └── inner/
│ ├── __init__.py
│ ├── bar.py
│ └── foo.py
├── __init__.py
└── inner/
└── __init__.py
@josephbirkner In the meantime please consider to use the generated api.py for importing the generated code. It was designed to make such imports simple. Importing the top level api.py will import whole zserio tree exactly as it's defined in the source.
For example, let's have the following structure of Zserio packages
my
package1
TestStruct
TestUnion
package2
TestStruct
TestEnum
Regardless of what it generates underneath, you can just import the top level api.py and you will get access to all Zserio entites defined in source:
import my.api as my
field1 = 10
# please note that there is no duplicit TestStruct in the path
# and that the path exactly equals to path defined in Zserio sources
testStruct = my.package1.TestStruct(field1)
print(testStruct.field1)
# ...
TestEnum = my.package2.TestEnum
testEnum = TestEnum(TestEnum.Values.ONE)
# ...
Also please note that we generate the code. It's always quite easy to fix one particular example by hand, but it wasn't too easy for us to solve it in general. In the time when we implemented pyhton support, we weren't able to solve issues with __init__.py and so we decided to use api.py. It seems to me that in fact it's similar approach as your _impl, just the api.py is interleaved in the generated code.
If we decide to change the way we do it, we must be absolutely sure that we are able to solve all the use-cases generally, so we must consider it very carefully.
Sounds very good. Sorry for my trail of edits, I had to go through quite a thought process myself😅 The api-way works nicely for now. And it looks like we have worked out a solution for this ticket in the future 👍
Don't worry about edits. :-) It's not so easy and obvious. It is good to discuss. The more heads, the more knowledge. It might be that we find out something better in the future.
Another problem with emitting one Python module per Zserio package is cyclic imports which are allowed in Zserio.
Also note that the approach with separate module for each Zserio entity works without need of cyclic dependencies in target language because Zserio has only limited support of recursion - only "direct" recursion is allowed. Thus when e.g. StructureA needs field of StructureB, it can never happen that also StructureB will need StructureA as a field at once.
That makes sense. I am very ok with the current solution, so you may close this issue at your convenience 😃