flyte [Plugin] TypeTransformer for TensorFlow tf.data.Dataset

[Plugin] TypeTransformer for TensorFlow tf.data.Dataset

Open dennisobrien opened this issue 2 years ago • 9 comments

Motivation: Why do you think this is important?

The tf.data.Dataset object encapsulates data as well as a preprocessing pipeline. It can be used in model fit, predict, and evaluate methods. It is widely used in Tensorflow tutorials and documentation and is considered a best practice when creating pipelines that saturate GPU resources.

Goal: What should the final outcome look like, ideally?

Flyte tasks should be able to pass tf.data.Dataset objects as parameters and accept them as return types.

Describe alternatives you've considered

There are caveats to passing tf.data.Dataset objects between tasks. Since a tf.data.Dataset object can have steps in the pipelines that use local Python functions (e.g., a map or filter step), there doesn't seem to be a way to serialize the object without effectively "computing" the graph pipeline. There are times this could be beneficial (doing an expensive preprocessing pipeline once can free up the CPU during training) but this could also be confusing to the Flyte end user.

So while adding a type transformer for tf.data.Dataset is certainly possible, it's still a good question if Flyte should actually support it given all the caveats. The alternative to consider here is to not support tf.data.Dataset. This seems like a question for the core Flyte team.

Propose: Link/Inline OR Additional context

There are at least three main ways to serialize/deserialize tf.data.Dataset objects.

These are probably in order of least complex to most complex. But determining the method of serialization/deserialization is an open question.

Some additional links:

This tensorflow github issue is asking about ways to serialize/deserialize a tf.data.Dataset as a deep copy without having the side-effect of "computing" the pipeline.
I asked a similar question on the Tensorflow Forum.

Are you sure this issue hasn't been raised already?

[X] Yes

Have you read the Code of Conduct?

[X] Yes

Oct 29 '22 16:10 dennisobrien

flyte flyte copied to clipboard

[Plugin] TypeTransformer for TensorFlow tf.data.Dataset

Motivation: Why do you think this is important?

Goal: What should the final outcome look like, ideally?

Describe alternatives you've considered

Propose: Link/Inline OR Additional context

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

flyte
flyte copied to clipboard