miller icon indicating copy to clipboard operation
miller copied to clipboard

More than one blank line at the end of CSV file: automate cleanup?

Open aborruso opened this issue 1 year ago • 4 comments
trafficstars

Hi @johnkerl, I sometimes run into errors like this::

mlr: mlr: CSV header/data length mismatch 2 != 1 at filename tmp.csv row 4.

This error occurs in many cases. Even when in CSV there are two blank lines at the end and not one.

a,b
1,2
4,6


I prefer to attach also a screenshot

image

For very large files, it's something I often miss, in the sense that I can't notice it and look for errors of other kinds.

Do you think it makes sense to introduce a more consistent error message, and/or automatically clean up the blank end lines and if there are more than one reduce them to one?

Thank you

aborruso avatar Jun 01 '24 17:06 aborruso

@johnkerl If you think it's not worth pursuing, I'll close it.

Thank you always

aborruso avatar Jun 23 '24 07:06 aborruso

@aborruso thanks -- let's keep this open -- it's worth pursuing

johnkerl avatar Jun 23 '24 14:06 johnkerl

@aborruso there are a few issues:

  • Enabling non-compliant CSV by default is a slippery slope
  • Even as opt-in behavior:
    • If the CSV has 2 or more columns then clearly the entirely empty lines can be ignored on an opt-in basis
    • But if the CSV has 1 column then it's not clear whether the blank line is
      • A row to be ignored
      • A completely legitimate row which has the empty-string value

My thought is to have an opt-in flag but with the caveat to the user that it will "eat" legitimate empty-string final rows in the one-column case ...

johnkerl avatar Jun 23 '24 20:06 johnkerl

But if the CSV has 1 column

This is the problem :(

The only thing I can think of is this: if the CSV has more than one blank line at the end, and we have for these end lines data length mismatch 2 (or more) != 1, keep only a blank line at the end.

It may be risky, though, and it's better to bump with the error and fix it.

aborruso avatar Jun 24 '24 16:06 aborruso