superset icon indicating copy to clipboard operation
superset copied to clipboard

feat: new Columnar upload form and API

Open dpgaspar opened this issue 1 year ago • 1 comments

SUMMARY

Continuing the work on deprecating/removing server side rendered pages, this adds a new React/Antd Form and backend API for Columnar uploads.

Leverages the work already done on: https://github.com/apache/superset/pull/28164, https://github.com/apache/superset/pull/28105, https://github.com/apache/superset/pull/27840

Columnar upload will accept ZIP files and parquet files.

A refactor was done on CSV and Excel upload, with this PR the initial parsing of the data file to get metadata info such as columns and sheet names was being done on the frontend, now we send the file to the backend to fetch it's metadata info. This is more maintainable and scalable since we avoid code duplication to parse data files, avoid having to add a new frontend dependencies for each file type, avoid blocking the UI while the file is being parsed. Although we send the entire file to the backend, by using pandas or pyarrow we can avoid having to parse the entire file, so I expect this operation to be relatively lightweight.

3 new endpoints are added to fetch data files metadata: api/v1/database/csv_upload_metadata api/v1/database/excel_upload_metadata api/v1/database/columnar_upload_metadata

Permission names for these endpoints are the same as their counterparts, so csv_upload on Database, excel_upload on Database and columnar_upload on Database

Possibly a single endpoint to upload data and a single endpoint to fetch metadata would be sufficient, but that would imply a single permission for all file types, and that's a breaking change. I think that a single permission and only 2 REST API endpoints is better, so having granular permissions for each file type is not actually required. I'll add a v5 breaking item if everyone agrees.

https://github.com/apache/superset/assets/4025227/60d69306-4990-4668-a096-80f9b4106bbd

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • [ ] Has associated issue:
  • [ ] Required feature flags:
  • [ ] Changes UI
  • [ ] Includes DB Migration (follow approval process in SIP-59)
    • [ ] Migration is atomic, supports rollback & is backwards-compatible
    • [ ] Confirm DB migration upgrade and downgrade tested
    • [ ] Runtime estimates and downtime expectations provided
  • [ ] Introduces new feature or API
  • [ ] Removes existing feature or API

dpgaspar avatar Apr 24 '24 09:04 dpgaspar

Codecov Report

Attention: Patch coverage is 59.59596% with 120 lines in your changes are missing coverage. Please review.

Project coverage is 67.46%. Comparing base (2e5f3ed) to head (4d20037). Report is 31 commits behind head on master.

Files Patch % Lines
...set/commands/database/uploaders/columnar_reader.py 37.70% 38 Missing :warning:
superset/databases/api.py 42.37% 34 Missing :warning:
...d/src/features/databases/UploadDataModel/index.tsx 71.15% 14 Missing and 1 partial :warning:
...perset/commands/database/uploaders/excel_reader.py 36.84% 12 Missing :warning:
superset-frontend/src/features/home/RightMenu.tsx 35.29% 9 Missing and 2 partials :warning:
superset/commands/database/uploaders/csv_reader.py 77.77% 4 Missing :warning:
superset/databases/schemas.py 91.66% 4 Missing :warning:
superset-frontend/src/pages/DatabaseList/index.tsx 33.33% 2 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #28192      +/-   ##
==========================================
+ Coverage   60.49%   67.46%   +6.97%     
==========================================
  Files        1931     1933       +2     
  Lines       76241    76544     +303     
  Branches     8566     8572       +6     
==========================================
+ Hits        46122    51642    +5520     
+ Misses      28015    22805    -5210     
+ Partials     2104     2097       -7     
Flag Coverage Δ
hive 49.25% <55.11%> (+0.08%) :arrow_up:
javascript 57.71% <61.11%> (-0.02%) :arrow_down:
mysql 77.42% <59.11%> (?)
postgres 77.53% <59.11%> (?)
presto 53.87% <55.11%> (+0.06%) :arrow_up:
python 77.91% <59.11%> (+14.42%) :arrow_up:
sqlite 76.99% <59.11%> (?)
unit ?

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar May 01 '24 15:05 codecov-commenter