o365beat icon indicating copy to clipboard operation
o365beat copied to clipboard

Preventing Duplicate Events

Open shwetas-syd opened this issue 4 years ago • 4 comments

We've noticed duplication of events and we're looking at ways to prevent them. I tried adding the add_id processor, but it's not available in the list of processors.

shwetas-syd avatar Mar 31 '20 02:03 shwetas-syd

I'd love to learn more, to differentiate whether this is a case of the beat repeating content downloads or if it's an artifact of the API itself. I'll check on the add_id processor, but the events themselves should have unique IDs already.

chris-counteractive avatar Mar 31 '20 16:03 chris-counteractive

Debugging logs show the beat querying the artifact and publishing events from the same date range multiple times. So, I suspect the O365Beat isn't getting an acknowledge from Elastic in time. Elastic recommends the add_id processor to prevent data duplication https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-deduplication.html

shwetas-syd avatar Apr 01 '20 05:04 shwetas-syd

Hi, I think it's not an o365beat issue. This is my pipeline: o365beat-->logstash (with geo info enrichment by a filter)-->output to file and to ES I see duplicate events in the file too. I solved on ES mapping in the logstash conf the document_id to the "Id" O365 field.

rob570 avatar May 14 '20 05:05 rob570

I'm thinking these duplicate events could be part of the same underlying issue described in my recent reply to @rob570's issue. I'll let you know when a fix is posted, and hopefully we can test it under your conditions!

chris-counteractive avatar May 26 '20 20:05 chris-counteractive