[Core feature] Support for Native JSON types in flyte
Motivation: Why do you think this is important?
Currently flyte supports JSON through protobuf Struct. This causes a lot of problems for the users, especially when using ints as Structs only store numbers - which are double values.
Goal: What should the final outcome look like, ideally?
Support a new FlyteIDL.LiteralType called LiteralType.JSON. So users can simply use any object that dumps to a JSON using binary / bytearray as the transport layer.
It should be possible to also have customer json serializers and deserializers for the data like JsonPickle etc
Describe alternatives you've considered
New literalType called LiteralType.JSON This maps to a Literal of type binary (wire format) Optionally the metadata / schema can be extracted and stored
Propose: Link/Inline OR Additional context
No response
Are you sure this issue hasn't been raised already?
- [X] Yes
Have you read the Code of Conduct?
- [X] Yes
To support native JSON types in Flyte, you propose introducing a new LiteralType.JSON in FlyteIDL, using binary/bytearray for transport, and allowing custom JSON serializers/deserializers. This approach aims to address issues with current JSON handling via protobuf Struct, particularly with integer representation.
Would you like more detail on how to implement this feature or any potential challenges?
Sources:
This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.
Mention @runllm to ask follow-up questions. See runllm.com for more!
Related to https://github.com/flyteorg/flyte/issues/4505
#take
From https://github.com/flyteorg/flyte/issues/4505#issuecomment-2060451042, I see the follow Python types that can use Literal.JSON:
dicts->JsonPickledataclasses-> UseMashumuro'sencoders/decorders or the object'sto_json+from_json(These methods are defined withDataClassJSONMixin)
@kumare3 Are you considering other Python types that would use Literal.JSON?
I am saying wholesale migration to json for these unsupported types from struct
https://github.com/flyteorg/flyte/pull/5337
The PRs will be created as follows:
-
Flyte [Flyte][1][IDL] Binary IDL With MessagePack [Flyte][2][Literal Type For Scalar] Binary IDL With MessagePack [Flyte][3][Attribute Access] Binary IDL With MessagePack [Flyte][4][FlyteCTL] Binary IDL With MessagePack [Flyte][5][Compiler][Struct Literal Type using JSON SCHEMA] Binary IDL With MessagePack
-
flytekit [flytekit][1][SimpleTransformer] Binary IDL With MessagePack [flytekit][2][untyped dict] Binary IDL With MessagePack [flytekit][3] [list, dict and nested cases] Binary IDL With MessagePack [flytekit][4][pure dataclass and nested dataclass] Binary IDL With MessagePack [flytekit][5][Attribute Access] Binary IDL With MessagePack [flytekit][6][Flyte Types] Binary IDL With MessagePack
follow up:
for flytekit, we can suport Dict[int, dataclass] as input, but this need to also change click_types.py, which will be big enough to open a new PR.
follow up 2:
we should consider rewrite get_literal_type for Dict[int, dataclasss] attribute access.
we should consider rewrite get_literal_type for Dict[int, dataclasss] attribute access
what is this now and what should it be?
we should consider rewrite get_literal_type for Dict[int, dataclasss] attribute access
what is this now and what should it be?
now: Dict[int, dataclasss] not supported, and Dict[int, dataclasss] not supported with attribute access
after: Dict[int, dataclasss] supported, and Dict[int, dataclasss] supported with attribute access
get_literal_type is the function to provide each field in the Dict for the propeller to access the attribute's type.