Remove pyarrow as a direct dependency
Tracking issue
Continues https://github.com/flyteorg/flyte/issues/4418
Why are the changes needed?
From https://github.com/flyteorg/flyte/issues/4418#issuecomment-1936333515, pyarrow is the largest dependency. This PR removes the dependency and lazy loads it.
What changes were proposed in this pull request?
With this PR, pyarrow is now lazy loaded. The lazy loading mechanism is the same as the one used for pandas.
How was this patch tested?
In two of the test environments, pyarrow is removed to make sure flytekit works without pyarrow installed.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 84.75%. Comparing base (
eb20459) to head (b482901).
Additional details and impacted files
@@ Coverage Diff @@
## master #2228 +/- ##
==========================================
+ Coverage 84.74% 84.75% +0.01%
==========================================
Files 315 315
Lines 24142 24142
Branches 3666 3666
==========================================
+ Hits 20458 20462 +4
+ Misses 3025 3024 -1
+ Partials 659 656 -3
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I am putting this as a draft. Removing pyarrow removes the indirect dependency on numpy. (pyarrow was the only library that depends on numpy).
I need to make sure flytekit works without numpy installed as well.