flyte icon indicating copy to clipboard operation
flyte copied to clipboard

[Plugin] TypeTransformer for TensorFlow tf.data.Dataset

Open dennisobrien opened this issue 2 years ago • 9 comments

Motivation: Why do you think this is important?

The tf.data.Dataset object encapsulates data as well as a preprocessing pipeline. It can be used in model fit, predict, and evaluate methods. It is widely used in Tensorflow tutorials and documentation and is considered a best practice when creating pipelines that saturate GPU resources.

Goal: What should the final outcome look like, ideally?

Flyte tasks should be able to pass tf.data.Dataset objects as parameters and accept them as return types.

Describe alternatives you've considered

There are caveats to passing tf.data.Dataset objects between tasks. Since a tf.data.Dataset object can have steps in the pipelines that use local Python functions (e.g., a map or filter step), there doesn't seem to be a way to serialize the object without effectively "computing" the graph pipeline. There are times this could be beneficial (doing an expensive preprocessing pipeline once can free up the CPU during training) but this could also be confusing to the Flyte end user.

So while adding a type transformer for tf.data.Dataset is certainly possible, it's still a good question if Flyte should actually support it given all the caveats. The alternative to consider here is to not support tf.data.Dataset. This seems like a question for the core Flyte team.

Propose: Link/Inline OR Additional context

There are at least three main ways to serialize/deserialize tf.data.Dataset objects.

  1. tf.data.Dataset.save and tf.data.Dataset.load.
  2. tf.data.Dataset.snapshot
  3. Iterator checkpointing

These are probably in order of least complex to most complex. But determining the method of serialization/deserialization is an open question.

Some additional links:

Are you sure this issue hasn't been raised already?

  • [X] Yes

Have you read the Code of Conduct?

  • [X] Yes

dennisobrien avatar Oct 29 '22 16:10 dennisobrien