visidata
visidata copied to clipboard
[csv-] use newline='' for opening files
The csv loader should use newline=''
. According to the csv library documentation:
"If newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly [...] It should always be safe to specify newline='', since the csv module does its own (universal) newline handling.
newline=''
is already used for saving CSV files (https://github.com/saulpw/visidata/issues/1362), added by 7a5e8341237c539ff9b8bdd51ff568d822e015b8.
An example file that demonstrates the difference:
newlines.csv
In vd newlines.csv
, the third row, second column, loads as "\n"
(on my Ubuntu system where newline is "\n"
. On Windows it would load as "\r\n"
, I think.). After the patch it loads as "\r\n"
.
I'm making this PR a draft, pending others' input, since I'm not well-versed in parsing CSV files.
Is there a canonical test suite of CSV files for parsing? I'd prefer to test this change before it's deployed.
Okay, so I made a small change to open_text_source
to pass through the kwargs. So this PR can be changed to simply the first newline=''
change (which seems reasonable to me).
We have sample_data/errors.csv
which should cover most of the weirdnesses found in csv-spectrum and csvkit/examples. But it doesn't cover encodings or alternate line-endings; we'd probably need separate files for those.