logstash-filter-csv icon indicating copy to clipboard operation
logstash-filter-csv copied to clipboard

multiline value enclosed in doublequotes cannot be parsed

Open 0asp0 opened this issue 5 years ago • 1 comments

Please post all product and debugging questions on our forum. Your questions will reach our wider community members there, and if we confirm that there is a bug, then we can open a new issue here.

For all general issues, please provide the following details for fast resolution:

  • Version: logstash 7.3.0
  • Operating System: Windows

csv rfc says that a value can contain multiple lines, broken by CRLF as long as the value is enclosed into double quotes:

https://tools.ietf.org/html/rfc4180

 6.  Fields containing line breaks (CRLF), double quotes, and commas
       should be enclosed in double-quotes.  For example:

       "aaa","b CRLF
       bb","ccc" CRLF
       zzz,yyy,xxx

We have multiline fields stored in elasticsearch like stacktraces. I exported them in discovery via csv export. Then I tried to import them via logstash to another elasticsearch instance.

CSV filter is throwing exception that a quote is missing, because it does not find it on the next new line.

Here are my filters:

    input
    {
      file
      {
        path => ['C:/work/elastic/input/csv/*.csv']
        sincedb_path => "C:/work/elastic/input/csv/db"
        start_position => "beginning"
        codec => multiline
        {
          pattern   => '(^\"\d{4}\-\d{2}\-\d{2} \d{2}\:\d{2}\:\d{2}.\d{3}\")|(^\"\@timestamp\")'
          negate    => "true"
          what      => "previous"
          max_bytes => "200 MiB"
          max_lines => 10000
          auto_flush_interval=> 2
          }
          }
    }

    filter
    {
      # workaround. Without gsub it will fail
      mutate
      {
        gsub =>  ["message", '\n"', '\\n"']
      }
      csv
      {
        autodetect_column_names => true
        autogenerate_column_names => true
        separator => ";"
        source => "message"
        skip_empty_columns => "true"
        target=> "mycsv"
      }
    }

I found the workaround with mutate's gsub to replace newlines with \n. But I would declare it as a bug which should be solved.

0asp0 avatar Aug 23 '19 12:08 0asp0

I also have same case. Removing the \r from line end solved my problem.

  mutate {
    gsub => [
      "message", "\r$", ""
    ]
  }

blacksudoku avatar Dec 16 '20 03:12 blacksudoku