logstash-filter-csv icon indicating copy to clipboard operation
logstash-filter-csv copied to clipboard

Handle new lines within fields of a record

Open ghost opened this issue 8 years ago • 6 comments

Hi, I am currently using logstash 2.3.0 and using csv filter to specify and match columns on the csv data read by a file beat. The issue is, in few cases, the csv file rows have multiple lines. i.e. crlf before the end of the row (due to the type of data present in the table as this is a table export)

For example: My filter: csv{ columns => ['ID','NAME','GRADE','SUBJECTS','EOF'] }

Works for CSV Data: 1, ABC, FIVE,COMPUTERS,$$$ 2, EFG, FIVE,SCIENCE,$$$ Fails when: 3, ABCV, FIVE, COMPUTERS SCIENCE,$$$ (CRLF is present before end of row) 4, ABCV, FIVE, COMPUTERS,$$$

So the rows with crlf gets rejected with parse exception. Is there a way I can specify a column separator ( in my case it is $$$CRLF)

Or is there any configuration which I can use to manage this scenario?

Please suggest. Thanks

ghost avatar May 20 '16 19:05 ghost

I was just about to raise something like this as I have the same issue :)

If you open a CSV file in libreoffice/whatever, it handles the CR/LF as knows that these are the same field cause it's looking for the next comma.

We should really do the same here if we can!

markwalkom avatar Jul 19 '16 09:07 markwalkom

The RFC says that handling of line feeds in CSV should be done - https://tools.ietf.org/html/rfc4180

(You can blame @PhaedrusTheGreek for pointing that out on the forums :p)

markwalkom avatar Jul 20 '16 08:07 markwalkom

I believe the spec was - only if it was enclosed in quotes.

PhaedrusTheGreek avatar Jul 20 '16 11:07 PhaedrusTheGreek

True!

markwalkom avatar Jul 20 '16 11:07 markwalkom

It also chokes on a trailing \r, which is what I end up with after using multiline codec to join all the quoted line endings.

Error parsing csv {:field=>"message", :source=>"one,two,\"three\n\",four\r", :exception=>#<CSV::MalformedCSVError: Unquoted fields do not allow \r or \n (line 1).>}

OrangeDog avatar May 05 '17 13:05 OrangeDog

may be helpful: https://stackoverflow.com/questions/44640604/logstash-parse-multiline-csv-file

rickyk586 avatar Oct 09 '20 15:10 rickyk586