here-cli
here-cli copied to clipboard
detect line breaks in the middle of a CSV row
Some CSVs can contain a line break in the middle of a field. We interpret the rest of the row as a new feature. If this occurs before the lat/lon fields, we will not add any coordinates. If it happens after the lat/lon fields, we will truncate the record, and try to add a second incomplete/invalid record.
This is common when the CSV was generated via Excel. See the attached CSV.
There are a couple of ways we could handle this:
- disregard line breaks within quoted fields (this seems to be consistent with the attached CSV)
- keep track of the expected number of columns (i.e. how many
,there are) in a row, and how many we have processed so far. If we see a line break (CR/LR) before the expected number of records, and the number of columns in next row plus the number of columns seen in the previous row add up to the expected number of columns, make it one record.
a related issue is orphaned quotation marks -- if there is a single " in a row, with no closing ", the CLI hangs while streaming.
The csv parser seems to be catching this in interactive mode (and while streaming smaller files), but we should just note the error and continue the upload.
When streaming, we should also report an error, and ideally a line number so users can track down the issue.