logstash-filter-kv
logstash-filter-kv copied to clipboard
kv filter: support escaped quotes
Migrated from JIRA: https://logstash.jira.com/browse/LOGSTASH-2272 which was replicated at https://github.com/elasticsearch/logstash/issues/1605
With the following config:
input { stdin { } }
filter { kv { } }
output {
stdout {
codec => 'json_lines'
}
}
The following message:
foo="bar \"baz\""
Should create the following output:
{"message":"foo=\"bar \\\"baz\\\"\"","@version":"1","@timestamp":"1969-01-01T01:01:01.000Z","host":"host.example.net","foo":"bar \"baz\""}
But instead creates the following output:
{"message":"foo=\"bar \\\"baz\\\"\"","@version":"1","@timestamp":"1969-01-01T01:01:01.000Z","host":"host.example.net","foo":"bar \\"}
confirmed this bug.
DO NOT USE THIS WORKAROUND see update note below.
for those who are looking for a workaround, this is what i've done (assuming kvdata is the source of kv pairs:
mutate {
gsub => ["kvdata","\\"","%22"]
}
kv {
source => "kvdata"
field_split => ","
}
urldecode {
all_fields => true
}
this essentially URL-encode the escaped quote...then URL-decode after kv filter...obviously, not very efficient...but gets the job done.
note that i've tried gsub " with a quote, then gsub quote back to double-quote (tried escaped and unescaped double-quote)...and what happens is that the resulting string has a double-escaped quote...dang! but even if that worked...this becomes a problem if you actually have a legit quote in your string.
UPDATE 10/12/2015: as of this writing, i would advise using the workaround above as it actually triggers another long-standing bug https://github.com/elastic/logstash/issues/3780 which has something to do with the urldecode/urlencode filter (i've verified this is the same error when i apply this workaround)...still prevalent as of v1.5.3
This bug is affecting me as well. Looks like it would be a fairly simple regex change, any specific constraints on contributions?
Changing the line
From
valueRxString = "(?:"([^"]+)"|'([^']+)'"
To
valueRxString = "(?:"((.|[^"])+)"|'((.|[^'])+)'"
Taken from this SO post
Should fix it right? If I have some time this weekend I'll look into actually testing this.
I applied the change proposed by @placeybordeaux and extended the existing test case. Unfortunately, I wasn't able to run the tests locally (not a Ruby dev myself), but the code behaves as it should in my manual tests.
Here's the diff, let me know if I should open a PR: https://github.com/logstash-plugins/logstash-filter-kv/compare/master...piquadrat:issue-2?expand=1
Are there any updates on this issue? Is this issue still unresolved?
hi all, is the 'escaped quote' problem still exists? i still get the bug in 'logstash-filter-kv-4.0.2'
hi all, the 'escaped quote' problem still exists, how can i fix it ? i still get the bug in 'logstash-filter-kv-4.0.1'
@untergeek
hi. Escaped quotes brought me parsing errors. The quotes in the value contain escaped quotes and contain the cut symbol. When parsing, the value is split by the cut character instead of using the double quotes as a whole.
My data format is:
"\"aaaa\" bbbb /cde"
Configured as:
kv { source => "message" field_split => "[,\s]" value_split => "=" trim_key => "\s" trim_value => """ }
the result is:
value = \"aaaa\"
Is there any hope for this issue? One of my newer colleagues just opened an internal bug ticket because he noticed parse issues due to this bug in our logging system. I had to point him at duplicate bugs going back YEARS. We never stopped suffering from this.