logstash-filter-csv
logstash-filter-csv copied to clipboard
Handle new lines within fields of a record
Hi, I am currently using logstash 2.3.0 and using csv filter to specify and match columns on the csv data read by a file beat. The issue is, in few cases, the csv file rows have multiple lines. i.e. crlf before the end of the row (due to the type of data present in the table as this is a table export)
For example: My filter: csv{ columns => ['ID','NAME','GRADE','SUBJECTS','EOF'] }
Works for CSV Data: 1, ABC, FIVE,COMPUTERS,$$$ 2, EFG, FIVE,SCIENCE,$$$ Fails when: 3, ABCV, FIVE, COMPUTERS SCIENCE,$$$ (CRLF is present before end of row) 4, ABCV, FIVE, COMPUTERS,$$$
So the rows with crlf gets rejected with parse exception. Is there a way I can specify a column separator ( in my case it is $$$CRLF)
Or is there any configuration which I can use to manage this scenario?
Please suggest. Thanks
I was just about to raise something like this as I have the same issue :)
If you open a CSV file in libreoffice/whatever, it handles the CR/LF as knows that these are the same field cause it's looking for the next comma.
We should really do the same here if we can!
The RFC says that handling of line feeds in CSV should be done - https://tools.ietf.org/html/rfc4180
(You can blame @PhaedrusTheGreek for pointing that out on the forums :p)
I believe the spec was - only if it was enclosed in quotes.
True!
It also chokes on a trailing \r
, which is what I end up with after using multiline codec to join all the quoted line endings.
Error parsing csv {:field=>"message", :source=>"one,two,\"three\n\",four\r", :exception=>#<CSV::MalformedCSVError: Unquoted fields do not allow \r or \n (line 1).>}
may be helpful: https://stackoverflow.com/questions/44640604/logstash-parse-multiline-csv-file