ai-dial-core icon indicating copy to clipboard operation
ai-dial-core copied to clipboard

feature request: advanced load balancing

Open justrp opened this issue 11 months ago • 0 comments

Current load balancing is using round robin strategy. I'd like to have more intelligent balancing strategies:

  • weights-based
  • latency-based (technically, it might be somehow implemented via weights controlled by external infrastructure)
  • tokens capacity/demand based (different endpoints might have different capacity - but that also might be solved by weights-based)

Some alternative solutions already claim to have this (e.g. https://github.com/microsoft/AICentral)

justrp avatar Mar 27 '24 16:03 justrp