vscode_rainbow_csv
vscode_rainbow_csv copied to clipboard
Consider skipping first (metadata) row during autodetection if the number of fields in it is inconsistent with other rows.
Take for example the following CSV that Mircosoft uses:
Machine Vulnerabilities Export,28 Feb 2023 11:42 AM +00:00
Severity,CVSS v3,Age (days),Has Exploit,Has Known Threats,Has Associated Alerts,Related Software,Description
Medium,6.5,14,False,False,False,mozilla:firefox;mozilla:firefox_esr;mozilla:thunderbird;suse:mozillafirefox;suse:mozillafirefox-devel;suse:mozillafirefox-translations-common;suse:mozillafirefox-translations-other;suse:mozillathunderbird;suse:mozillathunderbird-translations-common;suse:mozillathunderbird-translations-other;suse:libmozjs-102-0;suse:mozjs102;suse:mozjs102-devel;suse:mozillafirefox-branding-upstream,"This vulnerability affects the following vendors: Mozilla, Suse. To view more details about this vulnerability please visit the vendor website."
High,8.8,14,,False,False,False,ubuntu:firefox;mozilla:firefox,"This vulnerability affects the following vendors: Mozilla, Ubuntu. To view more details about this vulnerability please visit the vendor website."
The separator is ,, however is ; used which only is a cell value here.
Maybe look in the first row to find the correct separator.
The autodetection algorithm works this way:
- It tries to find a separator that would provide a consistent number of fields in each line. This attempt fails for this file and "," separator because the very first line has 2 fields, while other rows have 8 fields.
- If step 1 fails, if the extension is not ".csv" the autodetection algorithm just exits otherwise it just tries to find the best separator by frequency, and here separator ";" is probably more frequent than ",".
I guess one way to improve the algorithm would be to introduce step "1.5" which retries step 1 but without the first line since some csv files can contain meta information in it.
Thank you for this detailed explanation! Your proposed solution seems suitable.