logstash-filter-grok
logstash-filter-grok copied to clipboard
Why is this filter "slow"? A brief analysis
Total time to process the same apache log line 1_000_000 times with %{COMMONAPACHELOG}: 93.88s (~10k e/s)
The 6 biggest time offenders use 70.46s - 75% of total time:
| Action | Time (s) | % of Total | Times called per event |
|---|---|---|---|
| Grok#handle (self) | 21.16 | 22% | 10x |
| Regexp#match | 19.48 | 20% | 1x |
| Event#[]= | 15.58 | 16% | 9x |
| Event#[] | 8.51 | 9% | 10x |
| Logger.debug? | 5.73 | 6% | 3x |
| Proc#call | 4.54 | 4% | 11x |
Note: this is using:
- logstash-core master;
- logstash-core-event-java master
- logstash-filter-grok master
Ruby script used to test:
# encoding: utf-8
require 'jruby/profiler'
module LogStash
end
module LogStash::Environment
# running the grok code outside a logstash package means
# LOGSTASH_HOME will not be defined, so let's set it here
# before requiring the grok filter
unless self.const_defined?(:LOGSTASH_HOME)
LOGSTASH_HOME = File.expand_path("../../../", __FILE__)
end
# also :pattern_path method must exist so we define it too
unless self.method_defined?(:pattern_path)
def pattern_path(path)
::File.join(LOGSTASH_HOME, "patterns", path)
end
end
end
require 'logstash/event'
require 'logstash/environment'
require 'logstash/filters/grok'
require 'logstash/codecs/base'
filter = LogStash::Filters::Grok.new(
"match" => { "message" => "%{COMMONAPACHELOG}" },
)
filter.register
print "generating data.."
data = 1_000_000.times.map do
message = '198.46.149.143 - - [04/Jun/2015:02:29:31 +0000] "GET /blog/geekery/solving-good-or-bad-problems.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+semicomplete%2Fmain+%28semicomplete.com+-+Jordan+Sissel%29 HTTP/1.1" 200 10756 "-" "Tiny Tiny RSS/1.11 (http://tt-rss.org/)"'
LogStash::Event.new("message" => message)
end
puts "done. benchmarking..."
profile_data = JRuby::Profiler.profile do
data.each do |event|
filter.filter(event)
end
end
profile_printer = JRuby::Profiler::GraphProfilePrinter.new(profile_data)
profile_printer.printProfile(STDOUT)
The full profile.graph output can be found here
On a note: JRuby 9.0.5.0 performs similarly, if not slightly faster (after working around https://github.com/jruby/jruby/issues/3715):
main profile results:
Total time: 88.77
total self children calls method
----------------------------------------------------------------
88.77 0.46 88.31 4000001 Array#each
88.31 1.11 87.20 1000000 LogStash::Filters::Grok#filter
80.29 1.50 78.79 2000000 Hash#each
78.79 1.41 77.39 1000000 LogStash::Filters::Grok#match
75.69 1.08 74.62 1000000 LogStash::Filters::Grok#match_against_groks
72.80 0.81 71.99 1000000 Grok#match_and_capture
39.77 7.88 31.89 11000000 Proc#call
30.60 30.60 0.00 1000000 Regexp#match
25.34 11.93 13.41 10000000 LogStash::Filters::Grok#handle
6.84 6.84 0.00 9000000 LogStash::Event#[]=
5.17 5.17 0.00 10000000 LogStash::Event#[]
4.32 3.81 0.51 3000000 Cabin::Mixins::Logger.debug?
4.28 2.49 1.79 1000000 LogStash::Filters::Base#filter_matched
1.10 1.10 0.00 10000000 MatchData#[]
0.74 0.74 0.00 10000000 Kernel.kind_of?
...
On a side note, big props to @colinsurprenant for the logstash-core-event-java. In my tests using the old ruby version of logstash-core-event made the []/[]= methods take a bigger slice of the whole, making Regexp#match use only ~10/12% of the time.