typeshare icon indicating copy to clipboard operation
typeshare copied to clipboard

Support Python with pydantic

Open higumachan opened this issue 2 years ago • 5 comments

I regularly exchange data between Rust and Python (with pydantic). When I saw this project, I really wanted to be able to incorporate it into my workload.

Would I be welcome to make python-enabled contributions?

Also, it would be helpful if you could share any concerns you have with python support.

higumachan avatar Nov 24 '22 04:11 higumachan

Started giving this a shot. Here are a couple initial thoughts:

Order of globals matters

type MyTypeAlias = MyType
struct MyType{}

By default this library writes out globals in a fixed order: https://github.com/1Password/typeshare/blob/e461bb680bb1b0f943bee6c38498eaf1027f62e9/core/src/language/mod.rs#L30-L40

The Python implementation will need to override this and topologically sort globals to write them out in a valid order. I was generally able to do this but:

  • It's pretty invasive overriding of generate_types
  • We'd have to either bring in a dependency or vendor a topological sorting algorithm
  • We need to error out if there are cycles (Python doesn't support this)

I'd consider changing the default to topological sorting since I don't think that will be a problem for any other language and is likely more readable.

Mutable state on Language

Python needs some sort of mutable state to keep track of the imports it needs (from pydantic import BaseModel, from typing import Dict, etc.)

Each Language is supposed to be able to have mutable state:

https://github.com/1Password/typeshare/blob/main/core/src/language/mod.rs#L21

But the trait methods are declared with &self so self is immutable:

https://github.com/1Password/typeshare/blob/main/core/src/language/mod.rs#L27

I think it should have been &mut self but changing that would be a breaking change (it seems like Rust doesn't have the ability to "refine" traits to a more-general version w.r.t. mutability).

I got around this using RefCell but this can lead to runtime panics and makes things a lot more clunky.

Python versions and typing-extensions

The best way to a unit enum or a tagged union with a string discriminant

enum Foo {
  Bar,
  Baz,
}

#[typeshare]
#[serde(tag = "type", content = "content")]
pub enum EnumWithManyVariants {
    UnitVariant,
    TupleVariantString(String),
}

Is

Foo = Literal["Bar", "Baz"]

class EnumWithManyVariantsUnitVariant(BaseModel):
  type = Literal["UnitVariant"]

class EnumWithManyVariantsTupleVariantString(BaseModel):
  type = Literal["TupleVariantString"]
  content: str

EnumWithManyVariants = EnumWithManyVariantsUnitVariant | EnumWithManyVariantsTupleVariantString

Python typing features differ amongst Python versions. For example, Literal is only available in Python > 3.9. For older versions you get it from typing-extension, which is a 3rd party dependency. Thus, we'd have to add:

import sys
if sys.version_info < (3, 9):
   from typing_extensions import Literal
else:
  from typing import Literal

And users would have to know to add the typing-extensions dependency if needed. I think this is doable, but I'd hold off from the first prototype. So initially I'd target Python 3.10+.

Knowing type variables and imports ahead of time

This library works sorta like a visitor pattern: it visits types/structs and calls back to write them out. Since Python needs to write out imports, type variables and such before they are encountered it can be a bit awkward to first traverse structs and write those out. The way I implemented this was by creating a new io::Writer and using that to write out the body, mutating a list of imports along the way, then writing out the imports and copying one io::Writer to another. I think it would make more sense for Python to do two traversals:

  1. Visit all of the types and collect imports, type variables and globals.
  2. Topologically sort the globals.
  3. Visit every global and write it out.

This is currently possible but once again it would be super invasive because it requires rewriting or duplicating a lot of the logic in mod.rs .

Unpacking enum variants into classes

Since Python doesn't have ADTs we have to transform:

enum Foo {
  Bar(HashMap<K, V>),
  Baz(Vec<T>),

Into something like:

class FooBarInner(Generic[K, V]):
  type: Literal["Bar"]
  content: Dict[K, V]

class FooBazInner(Generic[T]):
  type: Literal["Baz"]
  content: Vec[T]

This requires knowing all of the generics used by a given variant (not just the entire enum). I wrote some hacky recursive functions for this, but this seems like another candidate to generalize and provide to language implementations.

adriangb avatar Nov 27 '22 20:11 adriangb

@InquisitivePenguin just curious if this is something you'd be interested in supporting? I think between https://github.com/1Password/typeshare/issues/20#issuecomment-1328332069 and #25 I've explored enough of the complexity of supporting Python that it would be good to step back and see if this is something you actually want.

adriangb avatar Nov 28 '22 08:11 adriangb

We would definitely be interested in supporting this! Thank you for the write-up and the work you've done. I'd be happy to assist with this effort once I can find the time.

snowsignal avatar Dec 01 '22 19:12 snowsignal

Awesome just ping me whenever you feel is a good time for you. Thanks for looking over this!

adriangb avatar Dec 01 '22 19:12 adriangb

Python 3.7 is almost EOL. I don't think it makes sense to support that. Literal is available since Python 3.8.

barseghyanartur avatar Jan 23 '23 13:01 barseghyanartur