mashumaro icon indicating copy to clipboard operation
mashumaro copied to clipboard

Implementing a mixin for Flatbuffers

Open timrulebosch opened this issue 2 years ago • 4 comments

Is your feature request related to a problem? Please describe. I'm interested in supporting Flatbuffers via a Mixin. I already have a DataClass based encoder/decoder which uses getattr(...) to call the generated Flatbuffer code as well as loading modules with importlib.import_module(). However, that could be much faster if the code to do the encoding/decoding would be generated once for the schema.

Describe the solution you'd like So far I can see how to implement the various serialization hooks (pre/post), but what would be the best way to implement the field serialization.

Generally, the code for each hooks needs to; based on the table/field name; load a module, call getattr() to find the right method to call, and then somehow emit the code in a way which can be used by the code builder. Possibly a default_encoder (Encoder)? Essentially, at some point, I need the list of fields, and a way to emit the necessary function calls to encode/decode data.

The pre/post hooks would take care of the "framing" of the Flatbuffer table (i.g. calling Start() and End() as well as creating a buffer at some point).

Describe alternatives you've considered Currently I use getattr() calls each time a DataClass is serialized. So, I would like to generate the code only once, based on the DataClass, and thus get hopefully a significant performance boost.

Additional context If its feasible, I don't mind to do implementation of the Mixin.

timrulebosch avatar May 12 '22 19:05 timrulebosch

I'm not the author so won't speak of what's possible, but upon review of the code for any of the json, msgpack, or yaml serializers it appears that all of the code building happens upon conversion to a dictionary. There is no code building being applied to serialize/deserialize objects for the formats supported.

I do think this could be done without a code building strategy, by leveraging a cache on the mixin you create that keeps a mapping of field -> method call. From there you could handle both serialization and deserialization by executing the mappings against the flatbuffer field -> method lookup table.

Not all that familiar with flatbuffers, but maybe something like...

[Edit]: Simplified to its essence.

from typing import Any, Mapping, Optional, Type, TypeVar

from mashumaro.mixins.dict import DataClassDictMixin
from mashumaro.serializer.json import DEFAULT_DICT_PARAMS
from typing_extensions import ClassVar, Protocol

T = TypeVar("T", bound="DataClassFlatBufferMixin")

def get_encoder(type: Type[T]):
    # use type and params to lookup module and methods
    field_encoders = {
        'field_name': lambda buffer, **kwargs: bytearray() # method call here
    def encoder(buffer: bytearray, obj: Mapping[str, Any]):
        for key in obj.keys():
        return buffer

    return encoder
def get_decoder(type: Type[T]):
    # use type and params to lookup module and methods
    field_decoders = {
        'field_name': lambda buffer, **kwargs: 0 # method call here

    def decoder(buffer: bytearray):
        return {key: field_decoder(buffer) for key, field_decoder in field_decoders.items()}

    return decoder

class Decoder(Protocol):
    def __call__(self, buffer: bytearray) -> Mapping[str, Any]: ...

class Encoder(Protocol):
    def __call__(self, buffer: bytearray, obj: Mapping[str, Any]) -> bytearray: ...

class DataClassFlatBufferMixin(DataClassDictMixin):
    __slots__ = ()
    __flatbuffer_encoder: ClassVar[Optional[Encoder]]
    __flatbuffer_decoder: ClassVar[Optional[Decoder]]

    # similar to a metaclass (but simpler)
    # allows setting class variables on any subclass of this type
    def __init_subclass__(cls: Type[T], **kwargs):
        cls.__flatbuffer_encoder = None
        cls.__flatbuffer_decoder = None

    def to_flatbuffer(self: T, buffer: bytearray):
        clazz = type(self)
        if not clazz.__flatbuffer_encoder:
            clazz.__flatbuffer_encoder = get_encoder(type(self))
        return clazz.__flatbuffer_encoder(

    def from_flatbuffer(
        cls: Type[T],
        data: bytearray,
    ) -> T:
        if not cls.__flatbuffer_decoder:
            cls.__flatbuffer_decoder = get_decoder(cls)
        return cls.from_dict(

BrutalSimplicity avatar May 30 '22 03:05 BrutalSimplicity

For reference, what I currently do is something like this:

Flatbuffer Schema:

namespace MyGame.Sample;

table Weapon {

Using generated code (API generated by Flatbuffer compiler):

import flatbuffers
import MyGame.Sample.Weapon

builder = flatbuffers.Builder(1024)

weapon = builder.CreateString('Sword')
MyGame.Sample.Weapon.AddName(builder, weapon)
MyGame.Sample.Weapon.AddDamage(builder, 3)
sword = MyGame.Sample.Weapon.End(builder)

buf = builder.Output() // Of type `bytearray`.

And then I have a dataclass defined like this:

class Weapon(FlatbufferTable):
    name: str = None
    damage: int = None
    _fbs_table: type = field(default=MyGame.Sample.Weapon, init=False, repr=False, compare=False)

which is used by my encoder library, which operates based on the dataclass definition and "generates" code:

getattr(self._fbs_table, 'Start')(builder)
object_map['name'] = builder.CreateString('Sword')
getattr(self._fbs_table, 'AddName')(builder, object_map['name'])
getattr(self._fbs_table, 'AddDamage')(builder, 3)
getattr(self._fbs_table, 'End')(builder)

timrulebosch avatar Jun 07 '22 10:06 timrulebosch

@BrutalSimplicity thanks for that suggestion. Do you think your idea would work with the "string" case above? For that I would need to call a few functions:

Note that each string in this code is normally generated from the dataclass fields, its hardcoded here for brevity, so the actual code would have a few extra calls.

object_map['name'] = builder.CreateString("Sword")
getattr(self._fbs_table, 'AddName')(builder, object_map['name'])

I think those getattr calls are going to be expensive(?), however, perhaps its possible to emit them as a code object and use it inplace of the lambda as you suggested. Its seems like it would work.

timrulebosch avatar Jun 07 '22 10:06 timrulebosch

Hi, guys!

I don't have experience with FlatBuffers, so it'll take me time to dive into this. But we can create subpackage mashumaro.mixins.third_party with the idea that anyone could create a mixin and put it there even if the code quality would have concerns. If someone responsible wants to create mashumaro.mixins.third_party.flatbuffers, I would accept such a pull request without any thoughts :)

Fatal1ty avatar Jun 07 '22 10:06 Fatal1ty