versitygw
versitygw copied to clipboard
[Bug] - shutting down gateway can lead to missed logs/events
Describe the bug This is just noticed in code, and hasn't been reported in practice yet. The s3log/webhook, s3event/kafka, s3event/nats, and s3event/webhook send the message in a separate goroutine. We currently don't have any way to flush these events on shutdown. So in flight messages can be lost if the gateway terminates.
To Reproduce unknown. probably racey with lots of events and then terminating gateway.
Expected behavior gateway termination should wait for all outstanding logs/events to complete.
Proposed Fix We should not be creating a goroutine for each message sent. Instead we should have a fixed size goroutine pool that is fed by a buffered channel. The pool allows for sending messages with some parallelism, but the size of the buffered channel limits the number of outstanding requests. This could optionally be unbuffered if desired too.
We may also need a context that limits the amount of time to wait to send various messages to prevent everything getting hung up on a non-responsive messaging service. In this case we may way to error out the gateway completely if there are problems with logs/events getting sent. The admin could reconfigure without the specific logging/events if the service is down but the gateway needs to continue to run. We need sufficient logging for errors that are happening if we are shutting down though.
The affected lines are: https://github.com/versity/versitygw/blob/778528895761b1b22c14a2c7d6e524194c5456cf/s3log/webhook.go#L134 https://github.com/versity/versitygw/blob/778528895761b1b22c14a2c7d6e524194c5456cf/s3event/kafka.go#L99 https://github.com/versity/versitygw/blob/778528895761b1b22c14a2c7d6e524194c5456cf/s3event/kafka.go#L107 https://github.com/versity/versitygw/blob/778528895761b1b22c14a2c7d6e524194c5456cf/s3event/nats.go#L86 https://github.com/versity/versitygw/blob/778528895761b1b22c14a2c7d6e524194c5456cf/s3event/nats.go#L94 https://github.com/versity/versitygw/blob/778528895761b1b22c14a2c7d6e524194c5456cf/s3event/webhook.go#L98 https://github.com/versity/versitygw/blob/778528895761b1b22c14a2c7d6e524194c5456cf/s3event/webhook.go#L106
The AuditLogger Interface has a Shutdown() method: https://github.com/versity/versitygw/blob/778528895761b1b22c14a2c7d6e524194c5456cf/s3log/audit-logger.go#L31 That could be used to close the channel, and wait or it to drain before completing.
Similarly the S3EventSender interface has a Close() that could be used in the same way: https://github.com/versity/versitygw/blob/778528895761b1b22c14a2c7d6e524194c5456cf/s3event/event.go#L29