Sam Stoelinga
Sam Stoelinga
I can reproduce with a simple curl command as well: ``` curl -v http://localhost:8000/openai//v1/completions \ 130 ↵ -H "Content-Type: application/json" \ -d '{"model": "qwen2-500m-cpu", "prompt": "Who was the first president...
`-L` with curl doesn't work either and returns the 400 error. I will rewrite the integration test to use `-L` as well. Good catch all! Full output: ``` curl -v...
The root cause seems to be a POST gets auto redirected to GET when using 301: https://datatracker.ietf.org/doc/html/rfc7231#section-6.4.2 We should use a HTTP 307 or 308 to keep it as a...
It's an issue with curl as well. The weird thing is that my integration test doesn't reproduce it, but I can very much reproduce the 400 error in my local...
I was able to reproduce in automated testing as well: https://github.com/substratusai/kubeai/actions/runs/11127386385/job/30919572183?pr=259#step:6:593
In the past I only had to use custom chat templates for specific models.
Seems the response isn't exactly following the OpenAI response. This is from OpenAI docs:  And this is what Infinity returns: ``` { "object": "embedding", "data": [ { "object": "embedding",...
I confirmed that this is blocking integration with KubeAI: ``` INFO: 10.244.0.15:35798 - "POST /v1/embeddings HTTP/1.1" 404 Not Found INFO: 10.244.0.15:35798 - "POST /v1/embeddings HTTP/1.1" 404 Not Found INFO: 10.244.0.1:43268...
I may be able to make this work with `url_prefix`. Giving that a try. On 2nd thought, I still think there should be 2 endpoitns by default for backwards compatability:...
That can be handled by respecting HF_TOKEN environment variable to automatically download auth gated models. That's how vLLM and other OSS does it.