Sam Stoelinga comments

Results 223 comments of


                                            Sam Stoelinga

Is there any plan on becoming a CNCF project?

Filled out the application here: https://github.com/cncf/sandbox/issues/377 Would love your support!

Route requests to take advantage of prefix caching

My thinking is we could treat e.g. the `X-Session-ID` HTTP header as a way to tell us that a request belongs to the same session. You can set custom HTTP...

Route requests to take advantage of prefix caching

I think it's important that the user has control over the behavior so I see a future we do both option 2 and 3. Option 3 would be nice due...

Route requests to take advantage of prefix caching

One more thought that came to mind for option 3. We can take the first 100 characters, 500 characters and 1000 characters and do hashing based on those.

Connections/Ollama API testing - http://host.docker.internal:11434 is failing

Thanks for trying it out and filing the issue. What's the reason for enabling the Ollama API? We utilize the same OpenAI compatible API so we can re-use it across...

MI300X - Not seeing any difference in Round Robin vs. PrefixHash-Aware Load Balancing

Did you patch the model to enable PrefixAware load balancing? One thing that helped us a lot was to enable Grafana and Prometheus metrics so we can see prefix cache...

如何直接使用模型参数启动模型呢

Take a look here: https://www.kubeai.org/how-to/configure-text-generation-models/#insecure-model-pulling You can configure a custom registry endpoint: ``` spec: url: ollama://my-local-registry:5000/my-model ``` In case your registry is insecure: ``` spec: url: ollama://my-local-registry:5000/my-model?insecure=true ``` @coyoteXujie Can...

Sam Stoelinga

Is there any plan on becoming a CNCF project?

Route requests to take advantage of prefix caching

Route requests to take advantage of prefix caching

Route requests to take advantage of prefix caching

Connections/Ollama API testing - http://host.docker.internal:11434 is failing

MI300X - Not seeing any difference in Round Robin vs. PrefixHash-Aware Load Balancing

如何直接使用模型参数启动模型呢

Add Minimum Load Threshold Setting

Add Minimum Load Threshold Setting

Add Minimum Load Threshold Setting