Open-Assistant
Open-Assistant copied to clipboard
Add distributed testing for inference server
Overview
We want to test the dockerised inference-server under different stress conditions such as;
- Load testing - handling many concurrent of users
- Latency testing - speed of response to users
This should inform changes to the inference server as it can help diagnose bottlenecks in the backend. It also gives us a better idea on the compute requirements for hosting a worker or inference server node in different conditions.
Tasks
- [x] #1622
- [ ] #1628
- [ ] #1629
- [ ] #1623
- [ ] #1624
- [ ] #1625
- [ ] #1626
Context
@yk suggested I work on this and I'm a research engineer at Faculty (https://faculty.ai/) committing almost full time to OS contributions for the foreseeable future
self-assign
Hi @jackapbutler do you still have interest in working on this? Or shall I unassign you from all load testing issues?
Hey @olliestanley, no sorry I won't be able to commit to this now.
No problem :)