Nick Stogner comments

Results 101 comments of


                                            Nick Stogner

How to specify GPU type in the Model template?

Note: you will need to restart (`kubectl delete pod `) the KubeAI and models Pods for the changes to take effect after deploying the config updates. We are currently working...

WIP: Add import mode to job reconciler

> which kueue labels concretely, just queue-name or some other too? This entails a separate script I guess which iterates over all jobs and patches them, right? > How we...

WIP: Add import mode to job reconciler

> Could you confirm this? Yes, I will test this now.

WIP: Add import mode to job reconciler

@alculquicondor A quick manual test appears to confirm the behavior of the Eviction condition being added but not acted upon (and resources not freed up for the preemptor job to...

feature: LLM Tracing, tracking token usages and access control in kubeai

Thanks for the suggestions @meetzuber!

Ensuring Label Propagation to Pods in Helm Chart

First of all, sorry about the delayed response, we had a lapse in Issue support. @nestoras KubeAI does not propagate Model labels to the Pods that are created to serve...

add ollama caching

Hey Kai, as we discussed over chat, vLLM is typically the go-to for serving concurrent production traffic. Does that work for you, or is ollama caching still important for you?

add ollama caching

That makes sense. We currently have 2 high priority features that we are focusing on: #132 and #266 ... We can probably fit this feature in after those.

Built-in Ollama example: pulling error from ollama: "stalled; retrying. If this persists"

Thanks for filing the issue, will take a look soon!

Built-in Ollama example: pulling error from ollama: "stalled; retrying. If this persists"

After reviewing this, I think we will wait for this to be fixed in the Ollama project since we are just providing this script as an example. Feel free to...