Sam Stoelinga

Results 223 comments of Sam Stoelinga

I checked the lingo logs and it indeed seems to not evenly spread out across the different endpoints. See results of each lingo replica. Replica 1: ``` k logs lingo-597b8794c9-rms8r...

Agree, I can't be sure about that. I think we should try to reproduce this with STAPI, min=10, max=10 wait for all replicas to be up and then do this...

Another dataset from recent run that's been running stable at 25 pods for a long time: ``` k logs lingo-597b8794c9-6j7b2 | grep "Sending request to backend" | awk '{ print...

This is now resolved since we refactored in KubeAI

This seems to be due to the lingo health check that hits `/`

archiving old issues that likely aren't relevant after KubeAI refactor

Seems this is now supported by setting environment variables: ``` OLLAMA_CONTEXT_LENGTH=8192 ``` Could you give this a try in your model spec? ```yaml spec: env: OLLAMA_CONTEXT_LENGTH: "8192" ``` Source: https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-specify-the-context-window-size

Workaround: remove the finalizer on the model object.

I got the request info and it turned out that we crap out when this is the URL: ``` DEBUG:aiohttp.client:Starting request ``` Notice the extra slash after `openai` I will...

I do think we need to fix this btw. We should simply strip an extra `/` in the request url so others don't hit similar issues. It's very easy to...