typeshare
typeshare copied to clipboard
Support Python with pydantic
I regularly exchange data between Rust and Python (with pydantic). When I saw this project, I really wanted to be able to incorporate it into my workload.
Would I be welcome to make python-enabled contributions?
Also, it would be helpful if you could share any concerns you have with python support.
Started giving this a shot. Here are a couple initial thoughts:
Order of globals matters
type MyTypeAlias = MyType
struct MyType{}
By default this library writes out globals in a fixed order: https://github.com/1Password/typeshare/blob/e461bb680bb1b0f943bee6c38498eaf1027f62e9/core/src/language/mod.rs#L30-L40
The Python implementation will need to override this and topologically sort globals to write them out in a valid order. I was generally able to do this but:
- It's pretty invasive overriding of
generate_types
- We'd have to either bring in a dependency or vendor a topological sorting algorithm
- We need to error out if there are cycles (Python doesn't support this)
I'd consider changing the default to topological sorting since I don't think that will be a problem for any other language and is likely more readable.
Mutable state on Language
Python needs some sort of mutable state to keep track of the imports it needs (from pydantic import BaseModel
, from typing import Dict
, etc.)
Each Language
is supposed to be able to have mutable state:
https://github.com/1Password/typeshare/blob/main/core/src/language/mod.rs#L21
But the trait methods are declared with &self
so self
is immutable:
https://github.com/1Password/typeshare/blob/main/core/src/language/mod.rs#L27
I think it should have been &mut self
but changing that would be a breaking change (it seems like Rust doesn't have the ability to "refine" traits to a more-general version w.r.t. mutability).
I got around this using RefCell
but this can lead to runtime panics and makes things a lot more clunky.
Python versions and typing-extensions
The best way to a unit enum or a tagged union with a string discriminant
enum Foo {
Bar,
Baz,
}
#[typeshare]
#[serde(tag = "type", content = "content")]
pub enum EnumWithManyVariants {
UnitVariant,
TupleVariantString(String),
}
Is
Foo = Literal["Bar", "Baz"]
class EnumWithManyVariantsUnitVariant(BaseModel):
type = Literal["UnitVariant"]
class EnumWithManyVariantsTupleVariantString(BaseModel):
type = Literal["TupleVariantString"]
content: str
EnumWithManyVariants = EnumWithManyVariantsUnitVariant | EnumWithManyVariantsTupleVariantString
Python typing features differ amongst Python versions. For example, Literal
is only available in Python > 3.9. For older versions you get it from typing-extension
, which is a 3rd party dependency. Thus, we'd have to add:
import sys
if sys.version_info < (3, 9):
from typing_extensions import Literal
else:
from typing import Literal
And users would have to know to add the typing-extensions
dependency if needed. I think this is doable, but I'd hold off from the first prototype. So initially I'd target Python 3.10+.
Knowing type variables and imports ahead of time
This library works sorta like a visitor pattern: it visits types/structs and calls back to write them out. Since Python needs to write out imports, type variables and such before they are encountered it can be a bit awkward to first traverse structs and write those out. The way I implemented this was by creating a new io::Writer
and using that to write out the body, mutating a list of imports along the way, then writing out the imports and copying one io::Writer
to another. I think it would make more sense for Python to do two traversals:
- Visit all of the types and collect imports, type variables and globals.
- Topologically sort the globals.
- Visit every global and write it out.
This is currently possible but once again it would be super invasive because it requires rewriting or duplicating a lot of the logic in mod.rs
.
Unpacking enum variants into classes
Since Python doesn't have ADTs we have to transform:
enum Foo {
Bar(HashMap<K, V>),
Baz(Vec<T>),
Into something like:
class FooBarInner(Generic[K, V]):
type: Literal["Bar"]
content: Dict[K, V]
class FooBazInner(Generic[T]):
type: Literal["Baz"]
content: Vec[T]
This requires knowing all of the generics used by a given variant (not just the entire enum). I wrote some hacky recursive functions for this, but this seems like another candidate to generalize and provide to language implementations.
@InquisitivePenguin just curious if this is something you'd be interested in supporting? I think between https://github.com/1Password/typeshare/issues/20#issuecomment-1328332069 and #25 I've explored enough of the complexity of supporting Python that it would be good to step back and see if this is something you actually want.
We would definitely be interested in supporting this! Thank you for the write-up and the work you've done. I'd be happy to assist with this effort once I can find the time.
Awesome just ping me whenever you feel is a good time for you. Thanks for looking over this!
Python 3.7 is almost EOL. I don't think it makes sense to support that. Literal
is available since Python 3.8.