fluentd icon indicating copy to clipboard operation
fluentd copied to clipboard

Make it possible to Specify ltsv parser plugin quoted value

Open MetalRex101 opened this issue 2 years ago • 3 comments

Is your feature request related to a problem? Please describe.

LTSV plugin can't understand quoted values Example: Having log: level=warn ts=2021-12-20T05:56:00.397096942Z caller=operator.go:516 component=alertmanageroperator msg="alertmanager key=kube-system/prometheus-operator-kube-s-alertmanager, field spec.baseImage is deprecated, 'spec.image' field should be used instead"

using ltsv plugin with next settings:

<parse>
  @type ltsv
  delimiter_pattern /\s+/
  label_delimiter =
  time_key ts
  time_format %Y-%m-%dT%H:%M:%S.%N%Z
</parse>

expected json:

{
  "level": "warn",
  "ts": "2021-12-20T05:56:00.397096942Z",
  "caller": "operator.go:516",
  "component": "alertmanageroperator",
  "msg": "alertmanager key=kube-system/prometheus-operator-kube-s-alertmanager, field spec.baseImage is deprecated, 'spec.image' field should be used instead"
}

actual result:

{
  "level": "warn",
  "ts": "2021-12-20T05:56:00.397096942Z",
  "caller": "operator.go:516",
  "component": "alertmanageroperator",
  "msg": "\"alertmanager",
  "key": "kube-system/prometheus-operator-kube-s-alertmanager,"
}

msg field is truncated, wrong key field is added to result json. Actual result is parsed wrong, because it doesn't understand double quotes where all msg field value enclosed.

Describe the solution you'd like

LTSV plugin understands, that value after label_delimiter could be enclosed in some symbol sequence, for example, double or single quotes. It will help to solve such cases.

Describe alternatives you've considered

It made me to use regexp parser type and put everything, that goes after ts field as message with next expression: /level=(?<level>.*)\sts=(?<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3,}[A-Z]+) (?<message>.*)/ which is much worse solution, that can be done with proper ltsv plugin implementation.

Additional context

No response

MetalRex101 avatar Dec 20 '21 13:12 MetalRex101

Is there any specification regarding LTSV's double-quote escapes? In particular, I'm concerned about:

  • What happens if a double-quoted field contains another double quotes in its value?
  • Is double quotes only significant for values? (or can keys be double-quoted as well?)
  • Are there any other special characters? (what about \ and '?)

That said, I'm personally unimpressed with this format extension, especially because LTSV was initially pitched as being super easy & fast to parse ("hey, you can just split the string with \t and parsing is done!").

Now it's slowly falling in the same rut of CSV, requiring a state machine to parse it properly. I have a vague feeling of history repeating itself here!

fujimotos avatar Jan 13 '22 01:01 fujimotos

Absolutely agreed with you here. And maybe application should not write logs in LTSV format, if it makes it complicated to parse. I think only values should be quoted. If there are any other double quotes, that not represent end of value inside it should be escaped for sure and it's an application responsibility. Logs parser should not solve all the problems, but i think we can make a little improvement to enable parse value from first delimiter symbol to second. And also think on ability to replace default delimiter (for example double quotes) with other symbol sequences to add i bit more flexibility.

MetalRex101 avatar Jan 13 '22 15:01 MetalRex101

I found this delimiter expression

/s(?=(?:[^"]*"[^"]*")*[^"]*$)/

here https://github.com/GEBITSolutions/fluent-plugin-fortigate-logs-parser/blob/9f5e3c31e762d5839e0f446054f9347da1892879/lib/fluent/plugin/parser_fortigate_logs.rb#L5

Even better - use the logfmt parser :)

adn77 avatar Jan 14 '23 14:01 adn77