logstash-codec-json
logstash-codec-json copied to clipboard
Optionally support handling of \x escape codes
Some clever ways of generating JSON are observed in nginx and apache httpd configs by using the log formatting function of these webservers to write json instead of plain text. The way this occurs is to use the normal log formatter functions of these programs.
However, because nginx and apache try to log readable text, any non-printable text will be visible as a two-digit, hex-encoded escapes, like \x09
We already make concessions for "invalid json" due to often-misconfigured nxlog json (via charset settings), so I think we can make another concession here for \xNN
codes.
Proposal:
input {
redis {
codec => json { escaped_hex_codes => true }
}
}
New setting, escaped_hex_codes
, when true, would automatically process \xNN
text before passing to the JSON parser.
I am open to input on a better setting name than escaped_hex_codes
Example nginx log_format
that can potentially create json that makes parsing barf:
log_format json '{"time":"$time_iso8601",'
'"host":"$http_host",'
'"remote":"$remote_addr",'
'"origin":"$http_x_forwarded_for",'
'"request":"$request",'
'"code":$status,'
'"size":$body_bytes_sent,'
'"referrer":"$http_referer",'
'"agent":"$http_user_agent"}';
Usually in the $http_user_agent
variable.
Example output that causes exception:
{
...[snip]...
"agent":"User-Agent\x09Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"
}
Another sample nginx log message with escaped double quotes. This is generated by logging the body of a JSON POST:
{"time":"2015-01-15T19:29:42+00:00","host":"www.sdlkjsdlkj.com","remote":"10.20.30.40","origin":"12.34.56.78","request":"POST / HTTP/1.1","code":200,"size":0,"referrer":"-","agent":"curl/7.30.0","body":"{\x22foo\x22:\x22bar\x22}"}
Has there been any advancement or workarounds for this?
just tried to add some app related things to nginx access log, to avoid uuid joining between 2 logs, and found this =( any plans making it real? Same as @kurtado we got one of fields json, and how i think we'll use ruby filter to unescape it.
so guys find a solution yet? My head hurts now ~_~!
example text: {\x22name\x22:\x22tom\x5Ccat\x22} If the results obtained should be run properly this: {"name":"tom\cat"} Directly in the ruby runtime it is possible, but in an environment logstash!
and now, I use this way to solve: ruby { code => 'str=event["request_body"]; str=str.gsub("\x22",""").gsub("\x5C", ""); event["request_body"]=str;' }
ruby {
code => "a = eval(%Q{\"#{event['escaped_parsed_request'][1..-2]}\"}); event['unescaped_parsed_request'] = (a == '-') ? '{}' : a"
}
@jordansissel is it the same codec that is used in json filter? cause json filter is failing for me the same way =(
I will work on this soon. I think we can come up with a workable solution; will report back soon.
Great thanks, If there's anything particular you need help with, I'd like to volunteer to help (I mean this task, not yellow leafs on your back yard :grin: )
@burdandrei haha, thank you for the offer of help! I'm hoping we can make a patch of some kind available soon that we'd love help testing. Stay tuned. :)
Heres a ruby filter based solution
ruby {
code => "event['message'] = event['message'].scan(/(?:\\x([a-f0-9][a-f0-9]))|(.)/i).map { |x| if x[0] == '22' then '\\"' elsif x[0] == '27' then '\\\'' elsif (x[0] == '5C' || x[0] == '5c') then '\\\\' else x[1] || x[0].hex.chr end}.join"
}
json {
source => "message"
# charset => "UTF-8"
}
Probably not 100% UTF8/MB safe.
Thanks to @splitice for the ruby code filter. I have replaced my GSub filter with the ruby one and I am no longer receiving the JSON decode errors in my log file, and access to resources now correctly appear in Kibana.
👍
@splitice I'm trying to use your solution, but I get:
{:timestamp=>"2016-09-29T10:57:10.380000+0000", :message=>"Ruby exception occurred: undefined method `scan' for nil:NilClass", :level=>:error}
I had to change input codec from json
to plain
. @yalok, I think this is what you missed.
I also had to tweak the ruby code to avoid saturating all 24 cores:
filter {
ruby {
code => "if event['message'].include?('\x') then event['message'] = event['message'].scan(/(?:\\x([a-f0-9][a-f0-9]))|(.)/i).map { |x| if x[0] == '22' then '\\"' elsif x[0] == '27' then '\\\'' elsif (x[0] == '5C' || x[0] == '5c') then '\\\\' elsif x[0] == '0A' then '\\n' else x[1] || x[0].hex.chr end}.join end"
}
json {
source => "message"
}
mutate {
remove_field => ["message"]
}
}
This adds \n
encoding and checking for \x
before doing any changes. Otherwise it's what @splitice posted.
As a further addendum to generalize this, you could rewrite the scan logic as:
event['message'].scan(/(?:\\x([a-f0-9]{2}))|(.)/i).map { |x| x[0] ? [x[0].to_i(16)].pack('U'): x[1] }.join
Can confirm we are still using a variant of the code I posted way back in April. We loose a small percentage of messages due to invalid sequences unfortunately,
I'll probably have a go about fixing that in 2017 unless someone with more time than I goes about fixing the more fundamental character decoding issue.
@splitice @yalok
We got the same error now.
=_=
ruby {
code => "event['message'] = event['message'].scan(/(?:\\x([a-f0-9][a-f0-9]))|(.)/i).map { |x| if x[0] == '22' then '\\"' elsif x[0] == '27' then '\\\'' elsif (x[0] == '5C' || x[0] == '5c') then '\\\\' else x[1] || x[0].hex.chr end}.join"
}
json {
source => "message"
# charset => "UTF-8"
}
Ruby exception occurred: undefined method `scan' for nil:NilClass {:level=>:error}
Well the message is telling you message is null. I've never had that, and if you are feeding in a source with message that can't happen. It's probably missconf of your input.
Maybe the best way is url decode before json decode.
@vus520 urlencoding is entirely different format. %20 etc.
@splitice
how about replace \x to % first then urldecode .
that would result strings containing %[num] being incorrectly translated.
Good news for nginx users: log_format
supports json
encoding starting from nginx 1.11.8:
- http://nginx.org/en/docs/http/ngx_http_log_module.html#log_format
Has anyone done any compatibility testing with strange encoding or real-world data?
I noticed there isnt any json support for nginx's error_log
yet
@splitice
try this one.
filter{
ruby {
code => " if event['message']
event['message'] = event['message'].gsub('\x22','').gsub('\x9B','').gsub('\x08','').gsub('\x09','').gsub('\x','')
end
"
}
json{
source => "message"
}
}
Not the best way, replace one by one ....
`code => 'event["request_body"] = event["request_body"].gsub("\\x22","\"").gsub("\\x5C","/")'`
test alot. this work..