flyte icon indicating copy to clipboard operation
flyte copied to clipboard

[Core feature] Support for Native JSON types in flyte

Open kumare3 opened this issue 1 year ago • 6 comments

Motivation: Why do you think this is important?

Currently flyte supports JSON through protobuf Struct. This causes a lot of problems for the users, especially when using ints as Structs only store numbers - which are double values.

Goal: What should the final outcome look like, ideally?

Support a new FlyteIDL.LiteralType called LiteralType.JSON. So users can simply use any object that dumps to a JSON using binary / bytearray as the transport layer.

It should be possible to also have customer json serializers and deserializers for the data like JsonPickle etc

Describe alternatives you've considered

New literalType called LiteralType.JSON This maps to a Literal of type binary (wire format) Optionally the metadata / schema can be extracted and stored

Propose: Link/Inline OR Additional context

No response

Are you sure this issue hasn't been raised already?

  • [X] Yes

Have you read the Code of Conduct?

  • [X] Yes

kumare3 avatar May 03 '24 00:05 kumare3

To support native JSON types in Flyte, you propose introducing a new LiteralType.JSON in FlyteIDL, using binary/bytearray for transport, and allowing custom JSON serializers/deserializers. This approach aims to address issues with current JSON handling via protobuf Struct, particularly with integer representation.

Would you like more detail on how to implement this feature or any potential challenges?

Sources:


This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.

Mention @runllm to ask follow-up questions. See runllm.com for more!

runllm[bot] avatar May 03 '24 00:05 runllm[bot]

Related to https://github.com/flyteorg/flyte/issues/4505

kumare3 avatar May 03 '24 01:05 kumare3

#take

MortalHappiness avatar May 03 '24 02:05 MortalHappiness

From https://github.com/flyteorg/flyte/issues/4505#issuecomment-2060451042, I see the follow Python types that can use Literal.JSON:

  1. dicts -> JsonPickle
  2. dataclasses -> Use Mashumuro's encoders/decorders or the object's to_json + from_json (These methods are defined with DataClassJSONMixin)

@kumare3 Are you considering other Python types that would use Literal.JSON?

thomasjpfan avatar May 03 '24 15:05 thomasjpfan

I am saying wholesale migration to json for these unsupported types from struct

kumare3 avatar May 03 '24 15:05 kumare3

https://github.com/flyteorg/flyte/pull/5337

wild-endeavor avatar May 08 '24 18:05 wild-endeavor

The PRs will be created as follows:

  • Flyte [Flyte][1][IDL] Binary IDL With MessagePack [Flyte][2][Literal Type For Scalar] Binary IDL With MessagePack [Flyte][3][Attribute Access] Binary IDL With MessagePack [Flyte][4][FlyteCTL] Binary IDL With MessagePack [Flyte][5][Compiler][Struct Literal Type using JSON SCHEMA] Binary IDL With MessagePack

  • flytekit [flytekit][1][SimpleTransformer] Binary IDL With MessagePack [flytekit][2][untyped dict] Binary IDL With MessagePack [flytekit][3] [list, dict and nested cases] Binary IDL With MessagePack [flytekit][4][pure dataclass and nested dataclass] Binary IDL With MessagePack [flytekit][5][Attribute Access] Binary IDL With MessagePack [flytekit][6][Flyte Types] Binary IDL With MessagePack

Future-Outlier avatar Sep 18 '24 14:09 Future-Outlier

follow up: for flytekit, we can suport Dict[int, dataclass] as input, but this need to also change click_types.py, which will be big enough to open a new PR.

Future-Outlier avatar Sep 25 '24 13:09 Future-Outlier

follow up 2: we should consider rewrite get_literal_type for Dict[int, dataclasss] attribute access.

Future-Outlier avatar Sep 25 '24 13:09 Future-Outlier

we should consider rewrite get_literal_type for Dict[int, dataclasss] attribute access

what is this now and what should it be?

wild-endeavor avatar Sep 27 '24 17:09 wild-endeavor

we should consider rewrite get_literal_type for Dict[int, dataclasss] attribute access

what is this now and what should it be?

now: Dict[int, dataclasss] not supported, and Dict[int, dataclasss] not supported with attribute access after: Dict[int, dataclasss] supported, and Dict[int, dataclasss] supported with attribute access get_literal_type is the function to provide each field in the Dict for the propeller to access the attribute's type.

Future-Outlier avatar Sep 28 '24 05:09 Future-Outlier