Varun Gupta

Results 20 issues of Varun Gupta

We are using Cassandra version 3.0.14 and use gocql driver. Number of open connections are 15k (which I understand is anti-pattern). Read = 8k qps Write = 3k qps Latencies...

waiting-for-info

### Scenario 1 model adapter load is failing infinitely I added an error in model load. As expected model is not loading. But problem is that, in model adapter it...

kind/enhancement
help wanted
area/lora
event/bugbash

### 🐛 Describe the bug ![image](https://github.com/user-attachments/assets/2ef727f7-a390-440b-8b97-ab985a964fd0) ### Steps to Reproduce _No response_ ### Expected behavior _No response_ ### Environment _No response_

Address https://github.com/aibrix/aibrix/issues/302

## Pull Request Description Use string based tokenizer to replace openai tokenizer. Reason is the latency overhead of openai tokenizer was 50 to 100ms. ## Related Issues Resolves: #673 **Important:...

## Pull Request Description Ignores worker pods for gateway routing ## Related Issues Resolves: #[Insert issue number(s)] **Important: Before submitting, please complete the description above and review the checklist below.**...

## Pull Request Description Make stream include usage as optional parameter. If request is for a user (user has default tpm limit, if not configured) then stream's include usage is...

## Pull Request Description Along with pod ready condition check, add check for containers ready as well. ## Related Issues Resolves: #781 **Important: Before submitting, please complete the description above...