visidata icon indicating copy to clipboard operation
visidata copied to clipboard

[csv-] use newline='' for opening files

Open midichef opened this issue 10 months ago • 1 comments

The csv loader should use newline=''. According to the csv library documentation:

"If newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly [...] It should always be safe to specify newline='', since the csv module does its own (universal) newline handling.

newline='' is already used for saving CSV files (https://github.com/saulpw/visidata/issues/1362), added by 7a5e8341237c539ff9b8bdd51ff568d822e015b8.

An example file that demonstrates the difference: newlines.csv In vd newlines.csv, the third row, second column, loads as "\n" (on my Ubuntu system where newline is "\n". On Windows it would load as "\r\n", I think.). After the patch it loads as "\r\n".

I'm making this PR a draft, pending others' input, since I'm not well-versed in parsing CSV files.

Is there a canonical test suite of CSV files for parsing? I'd prefer to test this change before it's deployed.

midichef avatar Apr 07 '24 00:04 midichef

Okay, so I made a small change to open_text_source to pass through the kwargs. So this PR can be changed to simply the first newline='' change (which seems reasonable to me).

We have sample_data/errors.csv which should cover most of the weirdnesses found in csv-spectrum and csvkit/examples. But it doesn't cover encodings or alternate line-endings; we'd probably need separate files for those.

saulpw avatar May 18 '24 22:05 saulpw