Allow None protocol to mean all data persistence supported storage options in Structured Dataset
TL;DR
Contributors of dataframe handling plugins use the StructuredDatasetTransformerEngine.register method to make the system aware of specific encoders and decoders. Up until now, the user was asked to specify the "protocol" associated with each handler.
This is awkward for two reasons:
- If your encoder/decoder just relies on the base data persistence layer flytekit provides, there is no reason to ask the contributor to register once per protocol. If we add a new protocol to the persistence layer, then someone would have to go through and update all these.
- Registering multiple protocols while having the default_for_type option in the signature makes it awkward. The impression is that you have to somehow decide a priori which protocol should be the default. This doesn't make sense - the protocol to pick should default to the storage layer determined by the raw output prefix instead.
We are:
- making the
protocolfield optional now. - when
None, all protocols that the data persistence layer knows about will be registered with that handler. - default_for_type will now default to False.
Type
- [x] Bug Fix
- [ ] Feature
- [ ] Plugin
Are all requirements met?
- [x] Code completed
- [x] Smoke tested
- [ ] Unit tests added
- [ ] Code documentation added
- [ ] Any pending items have an associated Issue
Complete description
How did you fix the bug, make the feature etc. Link to any design docs etc
Tracking Issue
https://github.com/flyteorg/flyte/issues/2755
Codecov Report
Merging #1134 (452e582) into master (c33702a) will increase coverage by
0.09%. The diff coverage is72.11%.
@@ Coverage Diff @@
## master #1134 +/- ##
==========================================
+ Coverage 68.25% 68.35% +0.09%
==========================================
Files 287 287
Lines 25814 25911 +97
Branches 2884 2900 +16
==========================================
+ Hits 17620 17711 +91
- Misses 7717 7721 +4
- Partials 477 479 +2
| Impacted Files | Coverage Δ | |
|---|---|---|
| flytekit/types/structured/bigquery.py | 0.00% <0.00%> (ø) |
|
| flytekit/core/data_persistence.py | 33.80% <33.33%> (-0.01%) |
:arrow_down: |
| flytekit/types/structured/basic_dfs.py | 54.09% <33.33%> (+4.09%) |
:arrow_up: |
| flytekit/types/structured/structured_dataset.py | 62.11% <69.04%> (+1.13%) |
:arrow_up: |
| ...ests/flytekit/unit/core/test_structured_dataset.py | 98.83% <95.65%> (-0.36%) |
:arrow_down: |
| tests/flytekit/unit/core/test_data_persistence.py | 100.00% <100.00%> (ø) |
|
| ...ekit/unit/core/test_structured_dataset_handlers.py | 100.00% <100.00%> (ø) |
|
| ...ctured_dataset/test_structured_dataset_workflow.py | 100.00% <100.00%> (ø) |
|
| flytekit/extras/tasks/shell.py | 77.71% <0.00%> (-0.45%) |
:arrow_down: |
| ... and 8 more |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
@esadler-hbo this is the pr i'm talking about btw... hoping to get it merged by eod.