logstash-filter-csv
logstash-filter-csv copied to clipboard
multiline value enclosed in doublequotes cannot be parsed
Please post all product and debugging questions on our forum. Your questions will reach our wider community members there, and if we confirm that there is a bug, then we can open a new issue here.
For all general issues, please provide the following details for fast resolution:
- Version: logstash 7.3.0
- Operating System: Windows
csv rfc says that a value can contain multiple lines, broken by CRLF as long as the value is enclosed into double quotes:
https://tools.ietf.org/html/rfc4180
6. Fields containing line breaks (CRLF), double quotes, and commas
should be enclosed in double-quotes. For example:
"aaa","b CRLF
bb","ccc" CRLF
zzz,yyy,xxx
We have multiline fields stored in elasticsearch like stacktraces. I exported them in discovery via csv export. Then I tried to import them via logstash to another elasticsearch instance.
CSV filter is throwing exception that a quote is missing, because it does not find it on the next new line.
Here are my filters:
input
{
file
{
path => ['C:/work/elastic/input/csv/*.csv']
sincedb_path => "C:/work/elastic/input/csv/db"
start_position => "beginning"
codec => multiline
{
pattern => '(^\"\d{4}\-\d{2}\-\d{2} \d{2}\:\d{2}\:\d{2}.\d{3}\")|(^\"\@timestamp\")'
negate => "true"
what => "previous"
max_bytes => "200 MiB"
max_lines => 10000
auto_flush_interval=> 2
}
}
}
filter
{
# workaround. Without gsub it will fail
mutate
{
gsub => ["message", '\n"', '\\n"']
}
csv
{
autodetect_column_names => true
autogenerate_column_names => true
separator => ";"
source => "message"
skip_empty_columns => "true"
target=> "mycsv"
}
}
I found the workaround with mutate's gsub to replace newlines with \n. But I would declare it as a bug which should be solved.
I also have same case. Removing the \r from line end solved my problem.
mutate {
gsub => [
"message", "\r$", ""
]
}