flytekit icon indicating copy to clipboard operation
flytekit copied to clipboard

Set default format of structured dataset to empty

Open pingsutw opened this issue 3 years ago • 2 comments

Signed-off-by: Kevin Su [email protected]

TL;DR

https://github.com/flyteorg/flyte/issues/2864

The current default format type is Parquet, and it leads to some issues

  1. SD transformer will always convert dataframe to parquet instead of cls.DEFAULT_FORMATS[df_type]
def t1() -> StructuredDataset: # The default format of structured dataset is Parquet here
  1. Failed to run BQ task if the cache is enabled because type validation is failing.
@task(cache=True, cache_version="1.0")
def t1() -> StructuredDataset: # The default format of structured dataset is Parquet here
    df = pd.DataFrame({"len": [len(sd.open(pd.DataFrame).all())]})
    return StructuredDataset(df, uri=bq_uri) # The format of structured dataset is "" 

In this PR, we set the default format to ""

def t1() -> StructuredDataset: # The default format of structured dataset is "" here
  1. By default, we use handlers[df_type][protocol][""] to encode a dataframe. if handlers[df_type][protocol][""] doesn't exist, then use handlers[df_type][protocol][cls.DEFAULT_FORMATS[df_type]].

Type

  • [x] Bug Fix
  • [ ] Feature
  • [ ] Plugin

Are all requirements met?

  • [x] Code completed
  • [x] Smoke tested
  • [x] Unit tests added
  • [ ] Code documentation added
  • [ ] Any pending items have an associated Issue

Complete description

^^^

Tracking Issue

^^^

Follow-up issue

NA

pingsutw avatar Sep 13 '22 12:09 pingsutw

Codecov Report

Merging #1159 (beec6a7) into master (2ccaed7) will increase coverage by 0.58%. The diff coverage is 76.78%.

@@            Coverage Diff             @@
##           master    #1159      +/-   ##
==========================================
+ Coverage   68.51%   69.09%   +0.58%     
==========================================
  Files         288      295       +7     
  Lines       26085    26983     +898     
  Branches     2918     2537     -381     
==========================================
+ Hits        17871    18644     +773     
- Misses       7735     7843     +108     
- Partials      479      496      +17     
Impacted Files Coverage Δ
flytekit/clis/helpers.py 94.59% <ø> (ø)
flytekit/clis/sdk_in_container/init.py 100.00% <ø> (+33.33%) :arrow_up:
flytekit/configuration/internal.py 16.43% <0.00%> (+0.43%) :arrow_up:
flytekit/core/context_manager.py 39.61% <0.00%> (ø)
flytekit/core/launch_plan.py 57.89% <0.00%> (ø)
flytekit/core/map_task.py 43.11% <0.00%> (ø)
flytekit/deck/__init__.py 0.00% <ø> (ø)
flytekit/extras/tensorflow/__init__.py 0.00% <0.00%> (ø)
flytekit/models/literals.py 40.28% <0.00%> (ø)
flytekit/tools/fast_registration.py 81.53% <0.00%> (-7.53%) :arrow_down:
... and 101 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov[bot] avatar Sep 13 '22 14:09 codecov[bot]

We are going to hold off on this change for a while, need to get the propeller change out and people migrated first. https://github.com/flyteorg/flytepropeller/pull/483

wild-endeavor avatar Sep 16 '22 23:09 wild-endeavor