roc-toolkit
roc-toolkit copied to clipboard
Add stress tests for task queues
For background, see #384.
We have unit tests for ctl::TaskQueue and pipeline::TaskPipeline, but unit tests can't detect all possible races. Since the implementation of the lock-free operations is tricky enough (especially in ctl::TaskQueue), it's important to write good stress tests that are able to detect races, and periodically run them on supported architectures (at least x86_64, arm32, and arm64).
We need two tests: one for ctl::TaskQueue and one pipeline::TaskPipeline. The first will be more complicated since TaskQueue provides more operations with tasks compared to TaskPipeline.
A stress test should repeatedly schedule, re-schedule, cancel, and wait tasks from multiple threads, at random time, and with random task deadline. Task processing should also take random time.
The random delays should be selected in a way so that we periodically have both contended and uncontended cases. The test should have enough randomness to cover the following cases:
- the task is in ready queue, sleeping queue, being processed, being finished, being waited, being cancelled, being re-scheduled from another thread
- the event loop thread is working or is sleeping when a task operations is invoked
- there are one or many concurrent operations with the queue, with the same or different tasks
- the number of concurrent operations is sometimes smaller than the number of CPUs, and sometimes larger
- the task has or doesn't have a completion handler (such tasks are handled a bit differently)
The test should ensure that the following invariants are always met:
- any operation with the queue completes eventually (no hangs)
- any scheduled and not cancelled task is processed eventually
- for any scheduled or cancelled task, the completion handler is called eventually
- any pending wait() completes eventually
- if the task was not rescheduled, but only scheduled and probably cancelled, the handler is invoked exactly once
- the same is true when scheduling the task again after waiting until it is fully finished; the handler should be invoked exactly one more time in this case
- if the task was rescheduled while it was pending, the processing and handler are allowed to be called twice (one call for previous schedule if the deadline was expired, and one call for new schedule)
- the task is processed and the handler is called not earlier than the task deadline expires
- task state reported via pending(), success(), and cancelled() should correspond to the expected state