Sam Stoelinga

Results 223 comments of Sam Stoelinga

I can reproduce with a simple curl command as well: ``` curl -v http://localhost:8000/openai//v1/completions \ 130 ↵ -H "Content-Type: application/json" \ -d '{"model": "qwen2-500m-cpu", "prompt": "Who was the first president...

`-L` with curl doesn't work either and returns the 400 error. I will rewrite the integration test to use `-L` as well. Good catch all! Full output: ``` curl -v...

The root cause seems to be a POST gets auto redirected to GET when using 301: https://datatracker.ietf.org/doc/html/rfc7231#section-6.4.2 We should use a HTTP 307 or 308 to keep it as a...

It's an issue with curl as well. The weird thing is that my integration test doesn't reproduce it, but I can very much reproduce the 400 error in my local...

I was able to reproduce in automated testing as well: https://github.com/substratusai/kubeai/actions/runs/11127386385/job/30919572183?pr=259#step:6:593

In the past I only had to use custom chat templates for specific models.

Seems the response isn't exactly following the OpenAI response. This is from OpenAI docs: ![image](https://github.com/user-attachments/assets/607def22-7262-4411-b079-7fc2ed12ad69) And this is what Infinity returns: ``` { "object": "embedding", "data": [ { "object": "embedding",...

I confirmed that this is blocking integration with KubeAI: ``` INFO: 10.244.0.15:35798 - "POST /v1/embeddings HTTP/1.1" 404 Not Found INFO: 10.244.0.15:35798 - "POST /v1/embeddings HTTP/1.1" 404 Not Found INFO: 10.244.0.1:43268...

I may be able to make this work with `url_prefix`. Giving that a try. On 2nd thought, I still think there should be 2 endpoitns by default for backwards compatability:...

That can be handled by respecting HF_TOKEN environment variable to automatically download auth gated models. That's how vLLM and other OSS does it.