flytekit icon indicating copy to clipboard operation
flytekit copied to clipboard

Allow None protocol to mean all data persistence supported storage options in Structured Dataset

Open wild-endeavor opened this issue 3 years ago • 2 comments

TL;DR

Contributors of dataframe handling plugins use the StructuredDatasetTransformerEngine.register method to make the system aware of specific encoders and decoders. Up until now, the user was asked to specify the "protocol" associated with each handler.

This is awkward for two reasons:

  • If your encoder/decoder just relies on the base data persistence layer flytekit provides, there is no reason to ask the contributor to register once per protocol. If we add a new protocol to the persistence layer, then someone would have to go through and update all these.
  • Registering multiple protocols while having the default_for_type option in the signature makes it awkward. The impression is that you have to somehow decide a priori which protocol should be the default. This doesn't make sense - the protocol to pick should default to the storage layer determined by the raw output prefix instead.

We are:

  • making the protocol field optional now.
  • when None, all protocols that the data persistence layer knows about will be registered with that handler.
  • default_for_type will now default to False.

Type

  • [x] Bug Fix
  • [ ] Feature
  • [ ] Plugin

Are all requirements met?

  • [x] Code completed
  • [x] Smoke tested
  • [ ] Unit tests added
  • [ ] Code documentation added
  • [ ] Any pending items have an associated Issue

Complete description

How did you fix the bug, make the feature etc. Link to any design docs etc

Tracking Issue

https://github.com/flyteorg/flyte/issues/2755

wild-endeavor avatar Aug 16 '22 17:08 wild-endeavor

Codecov Report

Merging #1134 (452e582) into master (c33702a) will increase coverage by 0.09%. The diff coverage is 72.11%.

@@            Coverage Diff             @@
##           master    #1134      +/-   ##
==========================================
+ Coverage   68.25%   68.35%   +0.09%     
==========================================
  Files         287      287              
  Lines       25814    25911      +97     
  Branches     2884     2900      +16     
==========================================
+ Hits        17620    17711      +91     
- Misses       7717     7721       +4     
- Partials      477      479       +2     
Impacted Files Coverage Δ
flytekit/types/structured/bigquery.py 0.00% <0.00%> (ø)
flytekit/core/data_persistence.py 33.80% <33.33%> (-0.01%) :arrow_down:
flytekit/types/structured/basic_dfs.py 54.09% <33.33%> (+4.09%) :arrow_up:
flytekit/types/structured/structured_dataset.py 62.11% <69.04%> (+1.13%) :arrow_up:
...ests/flytekit/unit/core/test_structured_dataset.py 98.83% <95.65%> (-0.36%) :arrow_down:
tests/flytekit/unit/core/test_data_persistence.py 100.00% <100.00%> (ø)
...ekit/unit/core/test_structured_dataset_handlers.py 100.00% <100.00%> (ø)
...ctured_dataset/test_structured_dataset_workflow.py 100.00% <100.00%> (ø)
flytekit/extras/tasks/shell.py 77.71% <0.00%> (-0.45%) :arrow_down:
... and 8 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov[bot] avatar Aug 17 '22 17:08 codecov[bot]

@esadler-hbo this is the pr i'm talking about btw... hoping to get it merged by eod.

wild-endeavor avatar Aug 19 '22 17:08 wild-endeavor