flyte icon indicating copy to clipboard operation
flyte copied to clipboard

[Core feature] Compress workflow state when storing it in etcd

Open Tom-Newton opened this issue 6 months ago • 1 comments

Motivation: Why do you think this is important?

Same motivation as https://github.com/flyteorg/flyte/issues/4569

In my mind the bigger we can make our workflows before hitting etcd size limits the better.

Goal: What should the final outcome look like, ideally?

Currently the workflow state is stored as a JSON string in etcd. JSON is not a very efficient data format. One alternative might be to encode to protobuf instead of JSON. This would provide savings for all the field names and all the numeric values that are currently stored as strings.

I know that map tasks already do some compression of their state for storing in etcd. I'm proposing expanding that idea and applying it more generally.

Describe alternatives you've considered

Map tasks: Map tasks already use a compressed form for all the workflow state which is great. However, they currently have some other limitation like being unable to launch sub-workflows or any other node types besides just individual tasks. Map tasks are also a less general solution.

Split up the workflow into lots of separate launchplans: This works and we do currently use it but it makes things less ergonomic and it requires the end user to have knowledge of implementation details of Flyte to understand how this helps and how to use it most effectively.

useOffloadedWorkflowClosure This avoids writing the spec field to etcd but the status field is much bigger. On dynamic tasks can easily be very small so this doesn't really help much.

Propose: Link/Inline OR Additional context

No response

Are you sure this issue hasn't been raised already?

  • [X] Yes

Have you read the Code of Conduct?

  • [X] Yes

Tom-Newton avatar Feb 05 '24 20:02 Tom-Newton

🤖

Thank you for your detailed request and the alternatives you've considered. Your suggestion to compress the workflow state by encoding it to protobuf instead of JSON is a valid approach. Protobuf, or Protocol Buffers, is a method of serializing structured data developed by Google. It is language-agnostic, making it easy to share data across different language platforms. Protobuf data is smaller, faster, and simpler than XML. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. Protobuf has a smaller overhead than JSON, resulting in smaller payloads.

In the Flyte codebase, the Go code is already using the protobuf package for data serialization. The protobuf package is imported and used in various parts of the code. For example, the WorkflowMetadata_OnFailurePolicy type is using the proto.EnumName function from the protobuf package. You can see this in the flyteidl/gen/pb-go/flyteidl/core/workflow.pb.go file.

However, it's important to note that while Protobuf can help with reducing the size of the data, it might not be the ultimate solution for handling very large workflows that exceed the etcd size limits. Other strategies might need to be considered, such as breaking down the workflows into smaller, more manageable pieces.

As for how the workflow state is currently being serialized before storing it in etcd, it is serialized using the Go's gob package and then encoded using base64. This is inferred from the provided context where the state is being deserialized. In the provided code, a base64 encoded string is being decoded using base64.NewDecoder(base64.RawStdEncoding, bytes.NewReader([]byte(str))). The result is then passed to gob.NewDecoder(reader) to decode the gob encoded data into a k8s.PluginState object. So, the reverse process (serialization before storing) would involve gob encoding the state and then base64 encoding the result. You can see this in the flytepropeller/pkg/controller/nodes/handler/state_test.go file.

I hope this information is helpful. If you have any more questions or need further clarification, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] avatar Feb 05 '24 20:02 dosubot[bot]