Sam Stoelinga comments

Results 223 comments of


                                            Sam Stoelinga

Load balancing doesn't seem to spread evenly

I checked the lingo logs and it indeed seems to not evenly spread out across the different endpoints. See results of each lingo replica. Replica 1: ``` k logs lingo-597b8794c9-rms8r...

Load balancing doesn't seem to spread evenly

Agree, I can't be sure about that. I think we should try to reproduce this with STAPI, min=10, max=10 wait for all replicas to be up and then do this...

Load balancing doesn't seem to spread evenly

Another dataset from recent run that's been running stable at 25 pods for a long time: ``` k logs lingo-597b8794c9-6j7b2 | grep "Sending request to backend" | awk '{ print...

Load balancing doesn't seem to spread evenly

This is now resolved since we refactored in KubeAI

health check results in unable to parse model error

This seems to be due to the lingo health check that hits `/`

health check results in unable to parse model error

archiving old issues that likely aren't relevant after KubeAI refactor

Ollama provide ability to set context length

Seems this is now supported by setting environment variables: ``` OLLAMA_CONTEXT_LENGTH=8192 ``` Could you give this a try in your model spec? ```yaml spec: env: OLLAMA_CONTEXT_LENGTH: "8192" ``` Source: https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-specify-the-context-window-size

Sam Stoelinga

Load balancing doesn't seem to spread evenly

Load balancing doesn't seem to spread evenly

Load balancing doesn't seem to spread evenly

Load balancing doesn't seem to spread evenly

health check results in unable to parse model error

health check results in unable to parse model error

Ollama provide ability to set context length

KubeAI stuck when configuring model caching with missing storageclass

issue with parsing model from json when using multiple / in the path

issue with parsing model from json when using multiple / in the path