Sam Stoelinga

Results 223 comments of Sam Stoelinga

Filled out the application here: https://github.com/cncf/sandbox/issues/377 Would love your support!

My thinking is we could treat e.g. the `X-Session-ID` HTTP header as a way to tell us that a request belongs to the same session. You can set custom HTTP...

I think it's important that the user has control over the behavior so I see a future we do both option 2 and 3. Option 3 would be nice due...

One more thought that came to mind for option 3. We can take the first 100 characters, 500 characters and 1000 characters and do hashing based on those.

Thanks for trying it out and filing the issue. What's the reason for enabling the Ollama API? We utilize the same OpenAI compatible API so we can re-use it across...

Did you patch the model to enable PrefixAware load balancing? One thing that helped us a lot was to enable Grafana and Prometheus metrics so we can see prefix cache...

Take a look here: https://www.kubeai.org/how-to/configure-text-generation-models/#insecure-model-pulling You can configure a custom registry endpoint: ``` spec: url: ollama://my-local-registry:5000/my-model ``` In case your registry is insecure: ``` spec: url: ollama://my-local-registry:5000/my-model?insecure=true ``` @coyoteXujie Can...

Thanks for the detailed write up. No need to change anything yet, need to think some more and review more thoroughly. Another approach could be dynamically tweaking the `replication` number:...

More details on the scenario from @mskouba I think the combination of having hundreds of instances and a prefix that is shared by a large portion of traffic leads to...

+1 to the +1 removal approach. Appreciate you both digging deep into that. I would like to get the +1 approach and the simulator merged.