graylog2-server
graylog2-server copied to clipboard
remove_field pipeline function performance regression
Adding support for removing fields with regular expressions (https://github.com/Graylog2/graylog2-server/pull/15131) can have a huge performance impact on messages with many fields.
What used to be a O(1) remove from a hash map, is now looping over every message field O(n)
to then perform a regex match.
Example benchmark on my system
Messages with 100 fields.
A pipeline rule with 20 remove_field calls. (20 is a realistic value, I've seen setups with 50)
Now:
org.graylog.plugins.pipelineprocessor.processors.PipelineInterpreter.executionTime:
mean: 336μs
throughput: 2500 msg/sec
Regex change reverted:
org.graylog.plugins.pipelineprocessor.processors.PipelineInterpreter.executionTime:
mean: 5μs
throughput: 11000 msg/sec
Expected Behavior
This change should've done by introducing a new pipeline function e.g. remove_fields (plural)
Current Behavior
The current remove_field function creates a performance regression for lots of users, and also changes the behavior unexpectedly:
https://github.com/Graylog2/graylog2-server/issues/16047
Possible Solution
Change this is back the way it was is not possible without introducing a breaking change for some users. I suggest we restore the previous behavior (supporting a single field only) and move the new functionality to a new function.
Users that already use the regex match will need to be notified and adopt their rules.
The only current workaround I can think of is replacing multiple calls to remove_field with something like
rename_field("@metadata_beat", "discard");
rename_field("@metadata_filename", "discard");
rename_field("@metadata_version", "discard");
...
remove_field("discard");
Given the severity of this bug, are we sure we just want to put this in the backlog?
As a severe performance issue this should probably be penned into 6.0.1
Introducing a new function for regex-removal also resolves https://github.com/Graylog2/graylog2-server/issues/16047. That issue was due to field names being treated as regex patterns when they were not intended as such.
@mpfz0r There is definitely a slowdown between regex and not. But I don't see the same magnitude of performance impact:
- 20
remove_fieldcalls without regex (for a message of 50 fields): mean 115μs - 20 calls with regex: mean 550μs
This is based on the same PipelineInterpreter.executionTime metric; and is consistent with the 5x reduction in throughput you reported.
For the purpose of assessing the severity of the issue and validity of the fix, I'm curious how you achieved the 5μs value?