superset icon indicating copy to clipboard operation
superset copied to clipboard

feat(csv-file-validations): Csv File Validations in CsvToDatabseConfiguration

Open SamraHanifCareem opened this issue 2 years ago • 5 comments

SUMMARY Adding CSV File Validations such as:

  • Max File Size
  • Rows to Read

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

CSV_file_validation

TESTING INSTRUCTIONS

  • Configure the following keys in config.py according to the requirements:
    1. CSV_MAX_SIZES 
      
    2. CSV_MAX_ROWS 
      
    3. CSV_MIN_ROWS
      
  • Route to Databases Screen
  • Upload CSV File

ADDITIONAL INFORMATION

  • [x] Required config changes : CSV_MAX_SIZES, CSV_MAX_ROWS, CSV_MIN_ROWS
  • [x] Changes UI

SamraHanifCareem avatar Jan 04 '23 10:01 SamraHanifCareem

Codecov Report

Merging #22587 (1cbed2d) into master (cf156f1) will decrease coverage by 11.08%. The diff coverage is 100.00%.

:exclamation: Current head 1cbed2d differs from pull request most recent head 4a50ece. Consider uploading reports for the commit 4a50ece to get more accurate results

@@             Coverage Diff             @@
##           master   #22587       +/-   ##
===========================================
- Coverage   67.00%   55.92%   -11.09%     
===========================================
  Files        1859     1859               
  Lines       71019    71028        +9     
  Branches     7766     7766               
===========================================
- Hits        47587    39719     -7868     
- Misses      21410    29287     +7877     
  Partials     2022     2022               
Flag Coverage Δ
hive 52.48% <100.00%> (+0.01%) :arrow_up:
mysql ?
postgres ?
presto 52.37% <100.00%> (+0.01%) :arrow_up:
python 58.16% <100.00%> (-23.17%) :arrow_down:
sqlite ?
unit 51.47% <100.00%> (+0.01%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/config.py 91.01% <100.00%> (-0.53%) :arrow_down:
superset/views/base.py 60.82% <100.00%> (-15.47%) :arrow_down:
superset/views/database/forms.py 86.90% <100.00%> (+1.00%) :arrow_up:
superset/utils/dashboard_import_export.py 0.00% <0.00%> (-100.00%) :arrow_down:
superset/tags/core.py 4.54% <0.00%> (-95.46%) :arrow_down:
superset/key_value/commands/update.py 0.00% <0.00%> (-90.91%) :arrow_down:
superset/key_value/commands/delete.py 0.00% <0.00%> (-87.88%) :arrow_down:
superset/key_value/commands/delete_expired.py 0.00% <0.00%> (-84.00%) :arrow_down:
superset/dashboards/commands/importers/v0.py 15.62% <0.00%> (-76.25%) :arrow_down:
superset/datasets/commands/create.py 30.61% <0.00%> (-69.39%) :arrow_down:
... and 287 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Jan 05 '23 20:01 codecov[bot]

@eschutho @betodealmeida This feature is very much needed, to stop application from crashing when large files are uploaded.

j4m4l avatar Feb 06 '23 09:02 j4m4l

Hey @SamraHanifCareem ! Thanks for addressing the problem and submitting a PR. We shouldn't limit the file size the user can upload (at least not by rows). If file is too big, we should consider some streaming/chunking of the file instead.

Antonio-RiveroMartnez avatar Feb 24 '23 13:02 Antonio-RiveroMartnez

Agreed that we shouldn't base things on rows, when the number of columns can be enormous, or even the contents of just a few cells. Any plans to investigate some sort of filesize limit or streaming option here @SamraHanifCareem? This PR needs a rebase in any case, if we're planning to carry it forward. Thanks!

rusackas avatar Feb 09 '24 20:02 rusackas

Yes @rusackas , let me investigate and come up with a short RFC on the proposed solution. cc: @j4m4l

SamraHanifCareem avatar Feb 12 '24 05:02 SamraHanifCareem