Open-Assistant
Open-Assistant copied to clipboard
Load test inference-server on different hardware
We want to test how many users the inference-server can serve and with what response times on setups with different numbers / types of GPUs & CPUs devices.
On the Stability AI cluster we can perform the load tests with up to 8, 16, 32, 128 pre-emptable GPUs.
self-assign
@jackapbutler need help with this or any related tasks? Happy to pick up something this weekend if there are any related side-tasks.
Hey @alando46, I've currently a bit of upfront investment to get it up and running with some basic cases but once that's settled you can definitely help out with the tasks 🙂
Sounds good, I've got a few years of experiencing setting up batch-based deep learning systems for image processing. Language models are new to me but I think there is likely some overlap. Comfortable with docker things, a good amount of k8s stuff, and general serving tools. let me know if you get to a spot where a second set of hands would be helpful. @jackapbutler