superset icon indicating copy to clipboard operation
superset copied to clipboard

chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Open EugeneTorap opened this issue 2 years ago • 1 comments

Fix #19986 issue when a user tries to install superset using Python 3.10 because pyarrow 5.0.0 doesn't have a wheel for Python 3.10

SUMMARY

In order to use Python 3.10 in superset we need to bump PyArrow (from 5.0.0 to 6.0.1) Also bump Pandas to latest minor (from 1.3.4 to 1.4.3).

Pandas 1.4 added a wheel for Python 3.9, Apple Silicon

Pandas 1.4 introduced support for using pyarrow as an engine for reading CSVs, which brings performance improvements (see https://pandas.pydata.org/docs/whatsnew/v1.4.0.html#multi-threaded-csv-reading-with-a-new-csv-engine-based-on-pyarrow for details). Therefore engine="pyarrow" has been added everywhere we're calling pd.read_csv.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • [ ] Has associated issue:
  • [ ] Required feature flags:
  • [ ] Changes UI
  • [ ] Includes DB Migration (follow approval process in SIP-59)
    • [ ] Migration is atomic, supports rollback & is backwards-compatible
    • [ ] Confirm DB migration upgrade and downgrade tested
    • [ ] Runtime estimates and downtime expectations provided
  • [ ] Introduces new feature or API
  • [ ] Removes existing feature or API

EugeneTorap avatar Aug 07 '22 08:08 EugeneTorap

Codecov Report

Merging #21002 (122d691) into master (e214e1a) will decrease coverage by 0.09%. The diff coverage is 63.98%.

:exclamation: Current head 122d691 differs from pull request most recent head 213bf79. Consider uploading reports for the commit 213bf79 to get more accurate results

@@            Coverage Diff             @@
##           master   #21002      +/-   ##
==========================================
- Coverage   66.34%   66.25%   -0.10%     
==========================================
  Files        1767     1770       +3     
  Lines       67312    67526     +214     
  Branches     7144     7182      +38     
==========================================
+ Hits        44656    44737      +81     
- Misses      20828    20953     +125     
- Partials     1828     1836       +8     
Flag Coverage Δ
hive 53.17% <45.76%> (+0.01%) :arrow_up:
mysql 80.96% <69.49%> (+0.04%) :arrow_up:
postgres 81.00% <69.49%> (+0.01%) :arrow_up:
presto 53.07% <45.76%> (+0.01%) :arrow_up:
python 81.43% <69.49%> (-0.04%) :arrow_down:
sqlite ?
unit 50.74% <52.54%> (+0.27%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...packages/superset-ui-core/src/query/types/Query.ts 100.00% <ø> (ø)
...set-ui-core/src/ui-overrides/ExtensionsRegistry.ts 100.00% <ø> (ø)
...ackages/superset-ui-core/src/utils/featureFlags.ts 100.00% <ø> (ø)
...rts/src/BigNumber/BigNumberTotal/transformProps.ts 0.00% <0.00%> (ø)
...lugin-chart-echarts/src/BigNumber/BigNumberViz.tsx 0.00% <0.00%> (ø)
...lugin-chart-echarts/src/BoxPlot/EchartsBoxPlot.tsx 0.00% <0.00%> (ø)
.../plugins/plugin-chart-echarts/src/BoxPlot/types.ts 0.00% <ø> (ø)
.../plugin-chart-echarts/src/Funnel/EchartsFunnel.tsx 0.00% <0.00%> (ø)
...d/plugins/plugin-chart-echarts/src/Funnel/types.ts 100.00% <ø> (ø)
...ns/plugin-chart-echarts/src/Gauge/EchartsGauge.tsx 0.00% <0.00%> (ø)
... and 89 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Aug 07 '22 10:08 codecov[bot]

@hughhhh @betodealmeida Can you review it?

EugeneTorap avatar Aug 08 '22 20:08 EugeneTorap

How should I fix this test? Pandas returns 0 instead of nan for the API

EugeneTorap avatar Aug 08 '22 20:08 EugeneTorap

How should I fix this test? Pandas returns 0 instead of nan for the API

Taking another look, I guess 0 makes sense from a contribution point of view. It should be fine in this case.

betodealmeida avatar Aug 08 '22 22:08 betodealmeida

@betodealmeida @villebro Can you review again?

EugeneTorap avatar Aug 16 '22 09:08 EugeneTorap

Nice work! Going to test this out very soon.

I know that there used to be the problem of and empty result set from SQLalchemy causing an Exception in pandas when using PyArrow 6.0 and higher, leading to unfriendly error messages in Explore (and charts on dashboards) instead of the friendly "No data" message.

cwegener avatar Aug 22 '22 03:08 cwegener