ai-dial-core
ai-dial-core copied to clipboard
feature request: advanced load balancing
Current load balancing is using round robin strategy. I'd like to have more intelligent balancing strategies:
- weights-based
- latency-based (technically, it might be somehow implemented via weights controlled by external infrastructure)
- tokens capacity/demand based (different endpoints might have different capacity - but that also might be solved by weights-based)
Some alternative solutions already claim to have this (e.g. https://github.com/microsoft/AICentral)