aibrix icon indicating copy to clipboard operation
aibrix copied to clipboard

LLMPilot: Generate the best deployment configuration for model + GPU combination

Open Jeffwan opened this issue 1 year ago • 6 comments

🚀 Feature Description and Motivation

For the 33b model deployment, we have a few options, A10, V100-32GiB, L20, L40. Technically, we can launch the instance using M * N GPU Type. However, we need to evaluate the most optimal plan for given latency/throughput/cost/ goals.

Selecting the most optimal GPU deployment for the model is a complex task that requires careful evaluation of those key metrics. By conducting benchmarks, analyzing costs, and considering community input, we can make an informed decision that meets our project goals. This RFC serves as a starting point for the discussion and invites contributions from all stakeholders.

Use Case

As a user, I want to know the best gpu types to run a specific model.

Proposed Solution

  1. benchmark tools + benchmark datasets (pluggable)
  2. experiment plans
  3. generate results

Jeffwan avatar Aug 22 '24 07:08 Jeffwan

VKE team already have some tools, we should review and evaluate that work.

Jeffwan avatar Aug 22 '24 07:08 Jeffwan

the model parameter like context length and parallelism could be different which brings additional challenges to get apple-to-apple comparison result

Jeffwan avatar Aug 22 '24 07:08 Jeffwan

We should leverage the deepseek-33b case to perfect the solution here. @kr11 Let's have a short discussion tomorrow on the next steps. VKE will publish their tools and we probably can leverage the parameter tuning in long run

Jeffwan avatar Aug 29 '24 13:08 Jeffwan

In v0.1.0, we should focus on using, polishing, improving existing tools build by VKE.

Jeffwan avatar Sep 11 '24 00:09 Jeffwan

parameter tuning or profiling would be advance features, we plan to work on in v0.2.0.

Jeffwan avatar Oct 17 '24 00:10 Jeffwan

This is the auto-tuning or profiling related stories. We also come up ideas like LLMPilot, v0.3.0 is too tight for this story and it can be postponed to v0.4.0

Jeffwan avatar Apr 28 '25 22:04 Jeffwan