graylog2-server icon indicating copy to clipboard operation
graylog2-server copied to clipboard

Codec is initialised on each incoming message

Open papirosko opened this issue 2 years ago • 5 comments

Expected Behavior

Codec should be initialised once when MessageInput is initialised (e.g. gelfCodecFactory.create(configuration) in GELFHttpInput:42 Instead, it is initialised on each message in DecodingProcessor:142:

        final Codec codec = factory.create(raw.getCodecConfig());

Possible Solution

The problem is that codec can contain some heavy initialisation (e.g. read some format details from configuration files), which causes overall graylog degradation when it is under heavy traffic. So the instance, created during initialisation, should be used.

Maybe one of the solutions is to select, how do you want to use codec:

  • per message
  • singleton
  • pool of instances

If the case when codec should know some configuration details (which are stored in message, as i can see), then details should be passed in decode() method.

Your Environment

graylog in docker

  • Graylog Version: 4.3.5
  • Java Version: 1.8

papirosko avatar Sep 02 '22 06:09 papirosko

@papirosko Thank you for the report. Do you have measurements of specific inputs where this is an issue?

We initialize the codec for each message because each message can have a different codec configuration.

bernd avatar Sep 05 '22 10:09 bernd

The only measurement we saw was the high CPU utilisation (up to 20%) on each node (3 nodes). This caused the regular searches also to be slow.

I believe the codec should have a defined lifecycle:

  • create
  • init
  • process(potential config as param)
  • close

this way the codec would have a possibility to cache most of its implementation details in init stage and override them with custom config on process stage (which should not be often).

Potentially, you can have 2 pools of codecs - for new incoming messages (they shouldn't override config at all) and for already stored ones (or whatever for you are override the config).

papirosko avatar Sep 06 '22 10:09 papirosko

I mean, I created my own input, that works with my own format of messages. And that caused the problems

papirosko avatar Sep 07 '22 06:09 papirosko

Thanks for the feedback. There will be no short-term change in the behavior because it's more involved than it seems. As a workaround, if you build your own input and have problems with expensive codec initialization, you can extract the expensive logic into a separate class and instantiate the object as a singleton in guice.

bernd avatar Sep 07 '22 17:09 bernd

As a workaround I simply use static fields of a class (the same idea)

papirosko avatar Sep 08 '22 04:09 papirosko