o365beat WARN beater/o365beat.go:249 start XX must be <=YY hrs ago, resetting

WARN beater/o365beat.go:249 start XX must be <=YY hrs ago, resetting

Open rob570 opened this issue 4 years ago • 3 comments

Hi, could anyone please explain me what happens in such case? This is the scenario: Assume I'm asking all events prior to YY hours, let's say 24h and now are 8:00:00 o365beats pulls a request with start time 8:00 of yesterday. But the response arrive at 8:05:00 five minute later the request (my tenant is really busy) Then I've got this warning. I suppose because an event starting at 8:00:00 of yesterday it's older than 8:05:00 minus 24h. Do I loss all the events or only the events older than 24h? Because I'm having a lot of this warning and a huge number of duplicate Id, and I'm guessing if it's related to this case. Thank you for clarifyng me, Roberto

May 24 '20 07:05 rob570

Thank you for the issue, Roberto, this is definitely an case that might require some work to account for how the API is designed (the endpoint for listing available content only supports asking for a <= 24hr window, and no more than 7 days in the past.

TL;DR: In the short term I recommend increasing the period setting in your config file from the default 5 minutes to something like 30m or 60m to give the beat more time to ship logs before it polls for more, while we look into a more durable fix and/or clearer error messages.

The message you're seeing (start xx must be <= yy hrs ago, resetting) does not drop all the events for that span, it just resets the start of the window to the oldest content available (by default 7 days, the max the API supports, or configurable under the content_max_age key). So you'd only lose events outside that content window, in your case, the 5 minutes. In most cases this only happens when the beat is first started, accounting for any small deviation in the time of execution and 7 days ago by Microsoft's reckoning ... most of the time the beat quickly gets into newer content locations and is never on that 7-day boundary again (unless you shut down the beat for a while). In your case, however, you may be seeing it more often because the beat "can't keep up" with your busy tenancy and can't ship all the events it pulls down before it polls again, so it grabs the same batch (or an overlapping batch), which causes your duplication problem.

I'm going to tag this as a bug, because there is a disconnect between the time of each event and the time content becomes available, and retrieving content blobs on a boundary may lead to duplicate events. And the error messages don't clarify why you're in the situation you're in. There are a few things we may need to fix here, including adding a configuration setting to pull less than 24 hours worth of available content (currently hard-coded).

In the short term I recommend increasing the period setting in your config file from the default 5 minutes to something like 30m or 60m to give the beat more time to ship logs before it polls for more. Thanks again!

May 26 '20 20:05 chris-counteractive

Thank you Chris for your beat and your code so well documented. I've changed line 314 of o365beat.go from: interval := 24 * time.Hour to interval := 2 * time.Hour and it's working better even if I continue to receive more than one published event log like:

2020-05-27T08:09:33.197+0200    DEBUG   [beat]  beater/o365beat.go:366  publishing 46 event(s)
2020-05-27T08:09:33.203+0200    DEBUG   [beat]  beater/o365beat.go:418  published blob created 2020-05-21 03:41:37.619 +0000 UTC, last was 2020-05-21 03:41:37.619 +0000 UTC, updating registry
2020-05-27T08:09:33.203+0200    DEBUG   [beat]  beater/o365beat.go:447  putting registry info (2020-05-21 03:41:37.619 +0000 UTC) to /etc/o365beat/o365beat2.state
2020-05-27T08:09:33.378+0200    DEBUG   [beat]  beater/o365beat.go:366  publishing 46 event(s)
2020-05-27T08:09:33.381+0200    DEBUG   [beat]  beater/o365beat.go:418  published blob created 2020-05-21 03:41:37.619 +0000 UTC, last was 2020-05-21 03:41:37.619 +0000 UTC, updating registry
2020-05-27T08:09:33.381+0200    DEBUG   [beat]  beater/o365beat.go:447  putting registry info (2020-05-21 03:41:37.619 +0000 UTC) to /etc/o365beat/o365beat2.state
2020-05-27T08:09:33.630+0200    DEBUG   [beat]  beater/o365beat.go:366  publishing 12 event(s)
2020-05-27T08:09:33.630+0200    DEBUG   [beat]  beater/o365beat.go:418  published blob created 2020-05-21 03:41:51.441 +0000 UTC, last was 2020-05-21 03:41:37.619 +0000 UTC, updating registry
2020-05-27T08:09:33.630+0200    DEBUG   [beat]  beater/o365beat.go:447  putting registry info (2020-05-21 03:41:51.441 +0000 UTC) to /etc/o365beat/o365beat2.state
2020-05-27T08:09:33.854+0200    DEBUG   [beat]  beater/o365beat.go:366  publishing 12 event(s)
2020-05-27T08:09:33.854+0200    DEBUG   [beat]  beater/o365beat.go:418  published blob created 2020-05-21 03:41:51.441 +0000 UTC, last was 2020-05-21 03:41:51.441 +0000 UTC, updating registry
2020-05-27T08:09:33.854+0200    DEBUG   [beat]  beater/o365beat.go:447  putting registry info (2020-05-21 03:41:51.441 +0000 UTC) to /etc/o365beat/o365beat2.state

But now, at least registry is updating and records are flowing. On the Elasticsearch side I solved setting the document Id to the Id given from Microsoft. Thank you again for your support, Roberto

P.S. In the past day I tried modifying poll time (even to 600m) but it didn't helped. I tryed to sync one container (Audit.Exchange, Audit.AzureActiveDirectory...) at time too with poor results

May 27 '20 08:05 rob570

Roberto - so glad you were able to make some progress, and thank you for the additional feedback and clarity. It's going to take a bit to work through what's happening here, I'm not sure exactly why changing that interval produced such a benefit, but I'm glad it did and will get to an answer soon. I'll let you know when I have a durable fix that avoids the duplication you're seeing.

May 28 '20 01:05 chris-counteractive

o365beat o365beat copied to clipboard

WARN beater/o365beat.go:249 start XX must be <=YY hrs ago, resetting

o365beat
o365beat copied to clipboard