imap-codec icon indicating copy to clipboard operation
imap-codec copied to clipboard

bindings(python): better message type

Open AzideCupric opened this issue 6 months ago • 10 comments

Congratulations on the release of the first version of the Python bindings! 🎉

As I mentioned in issue #559 and in the bindings/python documentation:

Access to data of message types (e.g., Greeting) is currently only available through dictionary representations.

To address this, I attempted to implement a library that facilitates the conversion between dict and Python classes using msgspec. You can check it out here.

I chose msgspec over pydantic because msgspec is lightweight and sufficient for the task. However, it can be easily replaced with pydantic if necessary.

Initially, I adjusted the original dict structure:

{
  "Ok": {
      "tag": "a001",
      "code": null,
      "text": "Message 17 is the first unseen message"
  }
}

Currently, enum variant names are used as keys. These can be used as a tag field in tagged unions, so I placed them as the value of codec_model:

{
  "codec_model": "Ok",
  "tag": "a001",
  "code": null,
  "text": "Message 17 is the first unseen message"
}

Specifically, if the value isn't an object or if it represents a Rust enum variant object, the value is placed in a codec_data field (as implemented in utils.py):

{
  "Unseen": 17
}
// transforms to
{
  "codec_model": "Unseen",
  "codec_data": 17
}	

Next, I created some msgspec structs in the models directory to build upon this structure:

Unlike Rust enums, Python enums do not support Algebraic Data Types (ADT), so I used Union to simulate them:

class TaggedBase(Struct, tag_field="codec_model"):
    pass

# Example where value isn't an object
class Unseen(TaggedBase):
    codec_data: int

# Example where value is an object
class AppendUid(TaggedBase):
    uid_validity: NoZeroUint
    uid: NoZeroUint

# Example where the value is a Rust enum variant object
class Untagged(TaggedBase):
    kind: StatusKind
    code: Code | None
    text: str

class Tagged(TaggedBase):
    tag: str
    body: StatusBody

class Bye(TaggedBase):
    code: Code | None
    text: str

class Status(TaggedBase):
    codec_data: Untagged | Tagged | Bye

And so on...

In my repository, I referred to imap-types to define all the structures that will be used in the Python bindings. I also defined some functions in validate.py for use:

_, command = type_codec_decode(CommandCodec, b"ABCD UID FETCH 1,2:* (BODY.PEEK[1.2.3.4.MIME]<42.1337>)\r\n")
>>> Command(
    tag="ABCD",
    body=Fetch(
        sequence_set=[Single(codec_data=Value(codec_data=1)), Range(codec_data=(Value(codec_data=2), "Asterisk"))],
        macro_or_item_names=MessageDataItemNames(
            codec_data=[NameBodyExt(section=Mime(codec_data=[1, 2, 3, 4]), partial=(42, 1337), peek=True)]
        ),
        uid=True,
    ),
)

type_codec_encode(command).dump()
>>> b"ABCD UID FETCH 1,2:* (BODY.PEEK[1.2.3.4.MIME]<42.1337>)\r\n"

model_dump(command)
>>> {
    "tag": "ABCD",
    "body": {
        "Fetch": {
            "sequence_set": [{"Single": {"Value": 1}}, {"Range": [{"Value": 2}, "Asterisk"]}],
            "macro_or_item_names": {
                "MessageDataItemNames": [
                    {"BodyExt": {"section": {"Mime": [1, 2, 3, 4]}, "partial": [42, 1337], "peek": True}}
                ]
            },
            "uid": True,
        }
    },
}

You can find some tests in the tests directory, which can be run using pytest.


I haven’t tested all the structures yet, so there might be some structural mistakes. The main goal of this issue is to propose a potential solution to improve the typing experience on the Python side as I understand it. Additionally, I’m interested in exploring any effective methods to test the consistency of the structure between imap-types and imap-codec-model (it does seem a bit overwhelming).

If it's feasible, perhaps this could be integrated into the Python bindings library. I’d be happy to contribute further!

AzideCupric avatar Aug 05 '24 16:08 AzideCupric