miller
miller copied to clipboard
More than one blank line at the end of CSV file: automate cleanup?
Hi @johnkerl, I sometimes run into errors like this::
mlr: mlr: CSV header/data length mismatch 2 != 1 at filename tmp.csv row 4.
This error occurs in many cases. Even when in CSV there are two blank lines at the end and not one.
a,b
1,2
4,6
I prefer to attach also a screenshot
For very large files, it's something I often miss, in the sense that I can't notice it and look for errors of other kinds.
Do you think it makes sense to introduce a more consistent error message, and/or automatically clean up the blank end lines and if there are more than one reduce them to one?
Thank you
@johnkerl If you think it's not worth pursuing, I'll close it.
Thank you always
@aborruso thanks -- let's keep this open -- it's worth pursuing
@aborruso there are a few issues:
- Enabling non-compliant CSV by default is a slippery slope
- Even as opt-in behavior:
- If the CSV has 2 or more columns then clearly the entirely empty lines can be ignored on an opt-in basis
- But if the CSV has 1 column then it's not clear whether the blank line is
- A row to be ignored
- A completely legitimate row which has the empty-string value
My thought is to have an opt-in flag but with the caveat to the user that it will "eat" legitimate empty-string final rows in the one-column case ...
But if the CSV has 1 column
This is the problem :(
The only thing I can think of is this: if the CSV has more than one blank line at the end, and we have for these end lines data length mismatch 2 (or more) != 1, keep only a blank line at the end.
It may be risky, though, and it's better to bump with the error and fix it.