graylog2-server
graylog2-server copied to clipboard
Codec is initialised on each incoming message
Expected Behavior
Codec should be initialised once when MessageInput is initialised (e.g. gelfCodecFactory.create(configuration)
in GELFHttpInput:42
Instead, it is initialised on each message in DecodingProcessor:142:
final Codec codec = factory.create(raw.getCodecConfig());
Possible Solution
The problem is that codec can contain some heavy initialisation (e.g. read some format details from configuration files), which causes overall graylog degradation when it is under heavy traffic. So the instance, created during initialisation, should be used.
Maybe one of the solutions is to select, how do you want to use codec:
- per message
- singleton
- pool of instances
If the case when codec should know some configuration details (which are stored in message, as i can see), then details should be passed in decode()
method.
Your Environment
graylog in docker
- Graylog Version: 4.3.5
- Java Version: 1.8
@papirosko Thank you for the report. Do you have measurements of specific inputs where this is an issue?
We initialize the codec for each message because each message can have a different codec configuration.
The only measurement we saw was the high CPU utilisation (up to 20%) on each node (3 nodes). This caused the regular searches also to be slow.
I believe the codec should have a defined lifecycle:
- create
- init
- process(potential config as param)
- close
this way the codec would have a possibility to cache most of its implementation details in init stage and override them with custom config on process stage (which should not be often).
Potentially, you can have 2 pools of codecs - for new incoming messages (they shouldn't override config at all) and for already stored ones (or whatever for you are override the config).
I mean, I created my own input, that works with my own format of messages. And that caused the problems
Thanks for the feedback. There will be no short-term change in the behavior because it's more involved than it seems. As a workaround, if you build your own input and have problems with expensive codec initialization, you can extract the expensive logic into a separate class and instantiate the object as a singleton in guice.
As a workaround I simply use static fields of a class (the same idea)