si icon indicating copy to clipboard operation
si copied to clipboard

S3Layer queued writes

Open jhelwig opened this issue 1 month ago • 1 comments

How does this PR change the system?

Screenshots:

Out of Scope:

How was it tested?

  • [ ] Integration tests pass
  • [ ] Manual test: new functionality works in UI
  • [ ] Manual test: (regression check) creating a component still works

Does it require a docs change?

  • [ ] No
  • [ ] Yes, and this PR includes it
  • [ ] Yes, and this PR does not include it (reasoning below)

Metrics!

There are some new panels to help inspect the state of what's going on with the S3Layer's internal queue, including the queue depth, the current backoff amount for rate limiting, the successful write rate, and how long each write takes from initial creation until eventual successful S3 write.

The graphs in the screenshot below were generated in my dev environment by sticking a simple HTTP proxy between the services and versitygw. The proxy would limit ObjectPut to 5 requests/second globally, and return a 503 when requests exceed that rate.

image

jhelwig avatar Nov 26 '25 17:11 jhelwig

Dependency Review

✅ No vulnerabilities or OpenSSF Scorecard issues found.

Scanned Files

None

github-actions[bot] avatar Nov 26 '25 17:11 github-actions[bot]

/try

jhelwig avatar Dec 01 '25 19:12 jhelwig

Okay, starting a try! I'll update this comment once it's running... 🚀 Try running here! 🚀

github-actions[bot] avatar Dec 01 '25 19:12 github-actions[bot]

/try

jhelwig avatar Dec 01 '25 19:12 jhelwig

Okay, starting a try! I'll update this comment once it's running... 🚀 Try running here! 🚀

github-actions[bot] avatar Dec 01 '25 19:12 github-actions[bot]

I'm curious: is it not possible to do s3 writes in parallel for better throughput? Although I suppose we're going to be doing lots of parallel writes for every workspace in the whole service so maybe a single queue for each layer cache makes sense

Next steps to increase throughput that I can think of:

  • Rate-limited queue per prefix (S3 scales up write speeds based on the unique key prefix)
  • Rate-limited parallel writes per queue

These mainly would come into play after S3 has scaled up to the point where we're no longer getting any 503s in the serial, single, per-cache queueing, though.

jhelwig avatar Dec 01 '25 21:12 jhelwig

/try

jhelwig avatar Dec 01 '25 22:12 jhelwig

Okay, starting a try! I'll update this comment once it's running... 🚀 Try running here! 🚀

github-actions[bot] avatar Dec 01 '25 22:12 github-actions[bot]