launcher icon indicating copy to clipboard operation
launcher copied to clipboard

Launcher is somewhat fragile around log buffering

Open directionless opened this issue 2 years ago • 0 comments

Osquery produces logs, and sends them to launcher, which immediately writes them to boltdb. Asynchronously, a process reads a chunk of them of them from boltdb and sends them to us. This process is fairly straightforward, but it contains a a theoretical bug. (Which has impacted us at least twice)

To wit, if the amount of time it takes to send a batch of logs, exceeds our servers max connection time (30s), the send will fail, and retry. When this happens, launcher will effectively use all the network bandwidth trying to resend that log batch. This is likely in places with exceptionally bad internet.

The code in question is at: https://github.com/kolide/launcher/blob/a322303d9f9ddd13ce1a505acc2ae3161b38216a/pkg/osquery/extension.go#L620

I think, in an ideal world, we should have some dynamic adjustment of the MaxBytesPerBatch setting. I don't know if we should make it some exponential backoff sort of thing, or if we should track how long it takes to send a batch, and adjust based on that. Happy to hear suggestions.

Along the way, maybe look at #705 and #1300

directionless avatar Aug 30 '23 01:08 directionless