flytekit icon indicating copy to clipboard operation
flytekit copied to clipboard

Remove pyarrow as a direct dependency

Open thomasjpfan opened this issue 1 year ago • 2 comments

Tracking issue

Continues https://github.com/flyteorg/flyte/issues/4418

Why are the changes needed?

From https://github.com/flyteorg/flyte/issues/4418#issuecomment-1936333515, pyarrow is the largest dependency. This PR removes the dependency and lazy loads it.

What changes were proposed in this pull request?

With this PR, pyarrow is now lazy loaded. The lazy loading mechanism is the same as the one used for pandas.

How was this patch tested?

In two of the test environments, pyarrow is removed to make sure flytekit works without pyarrow installed.

thomasjpfan avatar Mar 01 '24 16:03 thomasjpfan

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 84.75%. Comparing base (eb20459) to head (b482901).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2228      +/-   ##
==========================================
+ Coverage   84.74%   84.75%   +0.01%     
==========================================
  Files         315      315              
  Lines       24142    24142              
  Branches     3666     3666              
==========================================
+ Hits        20458    20462       +4     
+ Misses       3025     3024       -1     
+ Partials      659      656       -3     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Mar 01 '24 16:03 codecov[bot]

I am putting this as a draft. Removing pyarrow removes the indirect dependency on numpy. (pyarrow was the only library that depends on numpy).

I need to make sure flytekit works without numpy installed as well.

thomasjpfan avatar Mar 01 '24 17:03 thomasjpfan