S3Layer queued writes
How does this PR change the system?
Screenshots:
Out of Scope:
How was it tested?
- [ ] Integration tests pass
- [ ] Manual test: new functionality works in UI
- [ ] Manual test: (regression check) creating a component still works
Does it require a docs change?
- [ ] No
- [ ] Yes, and this PR includes it
- [ ] Yes, and this PR does not include it (reasoning below)
Metrics!
There are some new panels to help inspect the state of what's going on with the S3Layer's internal queue, including the queue depth, the current backoff amount for rate limiting, the successful write rate, and how long each write takes from initial creation until eventual successful S3 write.
The graphs in the screenshot below were generated in my dev environment by sticking a simple HTTP proxy between the services and versitygw. The proxy would limit ObjectPut to 5 requests/second globally, and return a 503 when requests exceed that rate.
Dependency Review
✅ No vulnerabilities or OpenSSF Scorecard issues found.Scanned Files
None
/try
Okay, starting a try! I'll update this comment once it's running... 🚀 Try running here! 🚀
/try
Okay, starting a try! I'll update this comment once it's running... 🚀 Try running here! 🚀
I'm curious: is it not possible to do s3 writes in parallel for better throughput? Although I suppose we're going to be doing lots of parallel writes for every workspace in the whole service so maybe a single queue for each layer cache makes sense
Next steps to increase throughput that I can think of:
- Rate-limited queue per prefix (S3 scales up write speeds based on the unique key prefix)
- Rate-limited parallel writes per queue
These mainly would come into play after S3 has scaled up to the point where we're no longer getting any 503s in the serial, single, per-cache queueing, though.
/try
Okay, starting a try! I'll update this comment once it's running... 🚀 Try running here! 🚀