sentry icon indicating copy to clipboard operation
sentry copied to clipboard

Events only getting halfway through ingest causing ML grouping weirdness

Open lobsterkatie opened this issue 1 year ago • 0 comments

Documenting for posterity:

We recently ran into a situation where it seemed as if a) we were ignoring Seer results and creating a new group even when Seer found a match, and b) sending the same hash to Seer multiple times.

But TL;DR, here is what actually happened:

  • event A (with hash 1) put hash 1 into Seer, but disappeared before it could create a group for hash 1
  • event B (with hash 2) matched hash 1 in Seer, but disappeared before it could create a group for hash 2 or be put into a group for hash 1
  • event C (with hash 2) matched hash 1 in Seer, but no hash 1 group existed, so it created a group for hash 2
  • event D (with hash 1) had no Seer match, so it created a group for hash 1

The more detailed explanation, with links below:

The sequence of events appeared to be this (where the starred lines seemed to indicate bugs):

  • Event A with hash 1
    • sent to seer
    • seer found no match
    • hash 1 added to seer DB
    • sentry created group 1 for hash 1 events
  • Event B with hash 2
    • sent to seer
    • seer matched hash 2 to hash 1
    • based on seer result, sentry should have put event into group 1, but instead created group 2 for hash 2 events *****
  • Event C with hash 2
    • based on hash, sentry should have put event into group 2, but instead sent it to seer *****
    • seer again matched hash 2 to hash 1
    • based on seer result, sentry should have put event into group 1, but instead put it in group 2 *****
  • Event D with hash 1
    • based on hash, sentry should have put event into group 1, but instead sent it to seer *****
    • seer found no match (seems you can't match to yourself?)
    • sentry put event into group 1

But it turned out the sequence was actually this (where the starred lines indicate the actual problem):

  • Event A with hash 1
    • sent to seer
    • seer found no match
    • hash 1 added to seer DB
    • event got lost partway through ingest, so group 1 wasn't created *****
  • Event B with hash 2
    • sent to seer
    • seer matched hash 2 to hash 1
    • event got lost partway through ingest, so it was neither assigned to group 1 (which would have been hard in any case, since group 1 didn't exist at that point) nor used to create group 2 *****
  • Event C with hash 2
    • group 2 didn't yet exist, so event sent to seer
    • seer again matched hash 2 to hash 1
    • based on seer result, sentry should have put event into group 1, but group 1 still didn't yet exist, so event was used to create group 2
  • Event D with hash 1
    • group 1 didn't yet exist, so event sent to seer
    • seer found no match (seems you can't match to yourself?)
    • sentry used event to create group 1

We know what was sent to seer, and what seer sent back, based on querying the logs for requests relating to hashes 1 and 2. We know events A and B got lost because searching for them in the issue stream yields no results. And we know that events C and D created groups 2 and 1, respectively, because they are each the oldest event in their issue, even though events A and B came in first.

A word of caution: Just to make this all more confusing, if you click on "oldest" in group 2, you actually get the second-oldest event, not event C, which is the true oldest event. Once we start recording timestamps with millisecond precision, presumably this problem will go away.

lobsterkatie avatar Aug 08 '24 19:08 lobsterkatie