Varun Gupta
Varun Gupta
We are using Cassandra version 3.0.14 and use gocql driver. Number of open connections are 15k (which I understand is anti-pattern). Read = 8k qps Write = 3k qps Latencies...
### Scenario 1 model adapter load is failing infinitely I added an error in model load. As expected model is not loading. But problem is that, in model adapter it...
### 🐛 Describe the bug  ### Steps to Reproduce _No response_ ### Expected behavior _No response_ ### Environment _No response_
Address https://github.com/aibrix/aibrix/issues/302
## Pull Request Description Use string based tokenizer to replace openai tokenizer. Reason is the latency overhead of openai tokenizer was 50 to 100ms. ## Related Issues Resolves: #673 **Important:...
## Pull Request Description Ignores worker pods for gateway routing ## Related Issues Resolves: #[Insert issue number(s)] **Important: Before submitting, please complete the description above and review the checklist below.**...
## Pull Request Description Make stream include usage as optional parameter. If request is for a user (user has default tpm limit, if not configured) then stream's include usage is...
## Pull Request Description Along with pod ready condition check, add check for containers ready as well. ## Related Issues Resolves: #781 **Important: Before submitting, please complete the description above...