Nick Stogner

Results 101 comments of Nick Stogner

Complication with opt B is that nodes might not have sufficient storage/mem.

@radu-catrangiu Thanks for taking the time to detail all of this. A few things that will make this more complicated: * PR (#304) that will expand on the number of...

vLLM issue to watch: https://github.com/vllm-project/vllm/issues/8523

I see 3 main options: 1. Integrate with vLLM to ask/be-told what the state of the cache is. 2. Sticky sessions based on request attributes (HTTP headers, etc). 3. Calculate...

Hey there @strus38, the Open WebUI project allows for users to integrate directly with a local instance of Ollama. While this works well when you are running Open WebUI on...

Yes, exactly. I will get a doc written up today that explains how to accomplish what you are trying to do in KubeAI. We still have some rough edges with...

I agree we need to tackle this poor experience soon. We have heard this same feedback on the Discord channel as well.

KubeAI produces some metrics that should help diagnose what is happening here: https://github.com/substratusai/kubeai/blob/8007ab256b536f8b15e31dd14bd2c1b9eadb3e3e/internal/metrics/metrics.go#L43 Can you verify the version of KubeAI you are running? (the image tag on the KubeAI Deployment)....

This might not be workable if different clouds require different resource requests... would need to confirm.