How does TUSD handle concurrency and loadbalancing?
I was playing around with TUSD, and was wondering what happens in the event of load balancing through multiple workers (receivers). As far I understand a file is being sent from client to server, while server builds up the file incrementally. However what happens if you have multiple "workers" - or replicas in a Kubernetes deployment, I do expect segments are being sent out to random workers, and they might not have received the previous increments.
One things that comes too mind is a shared - single volume or PV which is mounted to each worker, and has a many-read-write bind allowing each worker to write and read, however this raises some questions:
- Some cloud providers only allow one-writer-many-readers.
- If the above would work, on a supported cloud provider, what happens if 2 segments are send to two different workers, the second segment is received first, the first segments is received second - how would the protocol react?
These are just questions, from a user who didn't dive extremely deep into the technicalities, so hopefully this might provide an answer for future readers. Anyhow great protocol, would love to experiment and maybe use it in production at one day. Thanks!
I just got this answer from GitHub Copilot:
The scenario you described is a common challenge when deploying systems like TUSD in a distributed environment with multiple workers or replicas. Here's a breakdown of how TUSD handles such situations and the considerations involved:
- Shared Storage TUSD supports shared storage backends, such as AWS S3, Google Cloud Storage, Azure Blob Storage, or even a shared filesystem. These backends allow multiple workers to access the same storage, ensuring that all workers can read and write to the same file.
Single-Writer-Multiple-Readers Limitation: Some cloud providers enforce a single-writer constraint. TUSD mitigates this by using locking mechanisms to ensure that only one worker writes to a specific file at a time. For example, TUSD's memorylocker or filelocker implementations can be used to coordinate access. 2. Out-of-Order Segments TUSD is designed to handle out-of-order uploads. The protocol uses the Offset header to track the current position of the upload. When a worker receives a segment, it checks the Offset to determine where the segment should be written. If the segment's offset does not match the expected position, the worker will reject the upload or wait until the correct offset is reached.
Concurrency Control: To prevent race conditions where two workers might write to the same file simultaneously, TUSD relies on locking mechanisms. These locks ensure that only one worker can write to a specific file at any given time. 3. Protocol Behavior If two segments are sent to different workers, and the second segment is received before the first:
The second worker will notice that the Offset does not match the expected value and will either reject the segment or wait for the first segment to be written. Once the first segment is written, the second worker can proceed with writing its segment. 4. Deployment Considerations When deploying TUSD in a distributed environment:
Use a shared storage backend that supports concurrent access. Configure a locking mechanism (e.g., Redis, database locks, or TUSD's built-in lockers) to coordinate access between workers. Ensure that your load balancer uses sticky sessions if possible, so that segments from the same upload are routed to the same worker. This reduces the need for coordination between workers. 5. Experimentation and Testing To experiment with TUSD in a distributed setup:
Deploy TUSD with a shared storage backend (e.g., S3 or a shared filesystem). Simulate concurrent uploads with multiple workers and observe how the protocol handles out-of-order segments. Test the locking mechanism to ensure that it prevents race conditions. TUSD's design is robust and well-suited for distributed environments, but proper configuration and testing are essential to ensure smooth operation. If you have further questions or need help setting up a specific scenario, feel free to ask!
The FAQ entry for scaling tus services in general and tusd's documentation on upload locks cover this topic:
- https://tus.io/faq#how-do-i-scale-tus
- https://tus.github.io/tusd/advanced-topics/locks/
Please read them and I am happy to answer any questions that remain open. That being said, we should probably introduce a dedicated page about this topic in the tusd documentation.