collector
collector copied to clipboard
Prevent synchronous runs of the full snapshot collector
When the Postgres database is overloaded or the collector instance is under-provisioned, it's possible for a full snapshot to not finish in time before the next full snapshot is scheduled to begin. That can result in corrupted snapshots as well as resource contention leading to an out of memory event.
It would definitely be good to report this in the next full snapshot so we can alert customers to this issue. Looks like CollectorErrors is filled in by logger.PrintError, but it's not clear if the same is true for prefixedLogger.PrintError.
The correctness of this code definitely needs review. For example I think the early return is actually wrong because wg.Done() is never called.