logstash-logback-encoder icon indicating copy to clipboard operation
logstash-logback-encoder copied to clipboard

Avoid generating JSON with duplicate keys

Open nhmarujo opened this issue 4 years ago • 7 comments

Describe the bug Right now if someone adds a logstash marker with the same name as one of the default fields, it will generate a JSON with the two, which doesn't comply with JSON rules (no duplicate fields are allowed).

To Reproduce Create a log line with a logstash marker with a name you know will collide with a default one and then observe the generated JSON. Example:

log.info(append("message", "This will generate another key 'message" on the JSON"), "This will be added on 'message' by default");

Expected behavior Having an invalid JSON (with duplicate keys) will result in a log ingestion issue with many system (had it with fluentbit). I would expect that, when constructing the JSON, any fields that would result in duplicate keys to be dropped (only the first added would be included). Additionally I would suggest some sort of warning to be printed, to raise people's attention to the mapping mistake they introduced, so they would fix it ASAP.

  • logstash-logback-encoder 7.0.1
  • logback 1.2.7
  • jackson 2.13.0
  • java 17

nhmarujo avatar Dec 20 '21 18:12 nhmarujo

Worth checking if the same won't happen with a field added from automatically from the MDC, for instance. Bottom line - with any field being added.

nhmarujo avatar Dec 21 '21 08:12 nhmarujo

Hi Nuno,

Glad you raised this issue... funny enough we were debating about this exact same issue last week with @philsttr and decided to wait until someone complains about it ;-)

Jackson already comes with a similar feature: JsonGenerator#Feature#STRICT_DUPLICATE_DETECTION. This feature is disabled by default but can easily be activated with a FeatureJsonGeneratorDecorator as described in https://github.com/logfellow/logstash-logback-encoder#customizing-json-factory-and-generator. When this feature is enabled Jackson throws an exception when attempting to write a duplicate property. This will cause LLE to drop the current event and emit a WARN logback status. This is of course not (exactly) what you expect... but at least you know from the statuses if you have duplicate keys.

We thought about different strategies on how to handle duplicates:

  • keep the first and drop subsequent duplicate entries
  • serialize duplicate entries under a different name (e.g. with an incremental suffix)
  • drop the entire event
  • etc

This strategy operates at runtime. Some duplicates may be detected earlier at configuration time (two JsonProviders with the same fieldName...). These cases should ideally be detected during startup as well.

Any additional thought? Your feedback is welcome...

PS: yes, MDC entries can cause duplicates as well...

brenuart avatar Dec 21 '21 09:12 brenuart

Hi Bertrand,

Really happy to see such a warm reception to this issue 🙂

I think detecting these during startup should be really difficult since only when you try to build the messages you will be able to pinpoint if the issue will occur or not, no?

I think enabling JsonGenerator#Feature#STRICT_DUPLICATE_DETECTION, although not ideal, would already be better than the current behaviour. We would lose the log message, but we would get a warning to state what went wrong.

What I really fear is silent failure, which is what we experienced. In our case, only a single log line contained that issue, but it was enough to make almost 100% of the logs to be dropped and we were clueless about it. Why? Well, turns out fluentbit processes logs in batches and a single of these log entries inside one batch was enough to make the entire batch to fail and be dropped. And these log lines were quite frequent, so the chances of having at least one in each batch was quite high. If instead of having the logs with the duplicate entries fields we had a warning, I'm sure we would be able to detect it sooner 🙂

Honestly I like the incremental suffix idea. Is the one that presents most advantages:

  1. you don't lose the log entry
  2. you don't lose the marker/mdc entry/whatever
  3. it is clear, by seeing the suffix, that something is wrong

Wondering if it would still be worth to raise a warning in that situation anyway....

nhmarujo avatar Dec 21 '21 09:12 nhmarujo

I'm 100% with you... Checking for duplicates has a cost and may reduce performance a bit. This feature would probably be disabled by default.

I'll try to focus on this issue as soon as I have some time. Stay tuned.

brenuart avatar Dec 21 '21 09:12 brenuart

Thank you so much @brenuart . I will be watching closely 😉

nhmarujo avatar Dec 21 '21 09:12 nhmarujo

Hello, any news on the issue?

OksiBlack avatar Sep 03 '23 06:09 OksiBlack

we faced the same problem, any update?

antoniosg88 avatar Jan 29 '24 12:01 antoniosg88