Open-Assistant Load test inference-server on different hardware

Load test inference-server on different hardware

Open jackapbutler opened this issue 2 years ago • 4 comments

We want to test how many users the inference-server can serve and with what response times on setups with different numbers / types of GPUs & CPUs devices.

On the Stability AI cluster we can perform the load tests with up to 8, 16, 32, 128 pre-emptable GPUs.

Feb 16 '23 15:02 jackapbutler

self-assign

Feb 16 '23 15:02 jackapbutler

@jackapbutler need help with this or any related tasks? Happy to pick up something this weekend if there are any related side-tasks.

Feb 17 '23 07:02 alando46

Hey @alando46, I've currently a bit of upfront investment to get it up and running with some basic cases but once that's settled you can definitely help out with the tasks 🙂

Feb 20 '23 17:02 jackapbutler

Sounds good, I've got a few years of experiencing setting up batch-based deep learning systems for image processing. Language models are new to me but I think there is likely some overlap. Comfortable with docker things, a good amount of k8s stuff, and general serving tools. let me know if you get to a spot where a second set of hands would be helpful. @jackapbutler

Feb 21 '23 21:02 alando46

Open-Assistant Open-Assistant copied to clipboard

Load test inference-server on different hardware

Open-Assistant
Open-Assistant copied to clipboard