superset icon indicating copy to clipboard operation
superset copied to clipboard

fix: improve df to records performance

Open dpgaspar opened this issue 1 year ago • 1 comments

SUMMARY

Leverages to_dict https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_dict.html from pandas. To improve speed.

Simple benchmark results:

Time taken with old (rows 10000): 0.024200916290283203
Time taken with new (rows 10000): 0.006799936294555664
Percentage improvement: 71.90%
Time taken with old (rows 20000): 0.020637035369873047
Time taken with new (rows 20000): 0.01266789436340332
Percentage improvement: 38.62%
Time taken with old (rows 30000): 0.030869007110595703
Time taken with new (rows 30000): 0.01958012580871582
Percentage improvement: 36.57%
Time taken with old (rows 40000): 0.04102921485900879
Time taken with new (rows 40000): 0.025703907012939453
Percentage improvement: 37.35%
Time taken with old (rows 50000): 0.052468061447143555
Time taken with new (rows 50000): 0.03229331970214844
Percentage improvement: 38.45%
Time taken with old (rows 60000): 0.06525206565856934
Time taken with new (rows 60000): 0.04095005989074707
Percentage improvement: 37.24%
Time taken with old (rows 70000): 0.07497382164001465
Time taken with new (rows 70000): 0.04742288589477539
Percentage improvement: 36.75%
Time taken with old (rows 80000): 0.0855870246887207
Time taken with new (rows 80000): 0.05132102966308594
Percentage improvement: 40.04%
Time taken with old (rows 90000): 0.0931100845336914
Time taken with new (rows 90000): 0.05876302719116211
Percentage improvement: 36.89%
Time taken with old (rows 100000): 0.10411715507507324
Time taken with new (rows 100000): 0.06470870971679688
Percentage improvement: 37.85%
Time taken with old (rows 110000): 0.11521410942077637
Time taken with new (rows 110000): 0.07151579856872559
Percentage improvement: 37.93%
Time taken with old (rows 120000): 0.12491703033447266
Time taken with new (rows 120000): 0.07925105094909668
Percentage improvement: 36.56%
Time taken with old (rows 130000): 0.13448190689086914
Time taken with new (rows 130000): 0.08515024185180664
Percentage improvement: 36.68%
Time taken with old (rows 140000): 0.1515657901763916
Time taken with new (rows 140000): 0.09524393081665039
Percentage improvement: 37.16%
Time taken with old (rows 150000): 0.15466761589050293
Time taken with new (rows 150000): 0.09760618209838867
Percentage improvement: 36.89%
Time taken with old (rows 160000): 0.1649610996246338
Time taken with new (rows 160000): 0.10681009292602539
Percentage improvement: 35.25%
Time taken with old (rows 170000): 0.1754932403564453
Time taken with new (rows 170000): 0.1124107837677002
Percentage improvement: 35.95%
Time taken with old (rows 180000): 0.18709874153137207
Time taken with new (rows 180000): 0.12268805503845215
Percentage improvement: 34.43%
Time taken with old (rows 190000): 0.2039201259613037
Time taken with new (rows 190000): 0.12427520751953125
Percentage improvement: 39.06%

Memory profiles look the same, used 1M rows

Screenshot 2024-05-15 at 14 45 34 Screenshot 2024-05-15 at 14 46 18

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • [ ] Has associated issue:
  • [ ] Required feature flags:
  • [ ] Changes UI
  • [ ] Includes DB Migration (follow approval process in SIP-59)
    • [ ] Migration is atomic, supports rollback & is backwards-compatible
    • [ ] Confirm DB migration upgrade and downgrade tested
    • [ ] Runtime estimates and downtime expectations provided
  • [ ] Introduces new feature or API
  • [ ] Removes existing feature or API

dpgaspar avatar May 15 '24 12:05 dpgaspar

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 77.62%. Comparing base (76d897e) to head (4cba111). Report is 116 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master   #28512       +/-   ##
===========================================
+ Coverage   60.48%   77.62%   +17.13%     
===========================================
  Files        1931      521     -1410     
  Lines       76236    37436    -38800     
  Branches     8568        0     -8568     
===========================================
- Hits        46114    29060    -17054     
+ Misses      28017     8376    -19641     
+ Partials     2105        0     -2105     
Flag Coverage Δ
hive ?
javascript ?
mysql 77.18% <100.00%> (?)
postgres 77.29% <100.00%> (?)
presto 53.67% <100.00%> (-0.14%) :arrow_down:
python 77.62% <100.00%> (+14.13%) :arrow_up:
sqlite 76.73% <100.00%> (?)
unit ?

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar May 15 '24 13:05 codecov[bot]