kotaemon [BUG] Wrong ollama embedding endpoint

Description

Hi, I think you are calling the wrong endpoint for local embedding for ollama, if I use settings from your instructions here

From official ollama api documentation here it should be called: http://localhost:11434/api/embedd endpoint, but from koteamon it is called http://localhost:11434/api/embeddings

following works: curl http://localhost:11434/api/embedd -d '{ "model": "", "input": "Why is the sky blue?" }'

following does not: curl http://localhost:11434/api/embeddings-d '{ "model": "", "input": "Why is the sky blue?" }'

There is also a problem that on UI we don't get any notification about the issues and one has to look into the logs. It would be great if it could be a little bit more explicit.

Reproduction steps

1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error

Screenshots

![DESCRIPTION](LINK.png)

Logs

No response

Browsers

No response

OS

No response

Additional information

No response

Nov 15 '24 09:11 r0kk

So the examples we used following Ollama OpenAI API specs https://github.com/ollama/ollama/blob/main/docs/openai.md#curl

Please use the Test connection feature to make sure the Ollama connection working properly for both LLM & embedding models.

Nov 15 '24 11:11 taprosoft

It worked before, but ollama changed endpoints from /embeddings to /embed, so OpenAI client should not work anymore because it uses /embeddings. At least this is my understanding. Screenshot 2024-11-15 125451

The openAI client endpoint: Screenshot 2024-11-15 125809

Nov 15 '24 11:11 r0kk

Same issue here with ollama -v ollama version is 0.4.1 and kotaemon full docker image

Nov 15 '24 13:11 Neurozone

+1

Nov 18 '24 09:11 arno4000

Is this just an issue with Ollama v0.4?

ollama -v
ollama version is 0.3.9

Call from within Kotaemon app docker runtime:

root@justin-two-towers:/app# curl localhost:11434/api/embeddings -d '{ "model": "llama3.1:8b", "input": "Why is the sky blue?" }'
{"embedding":[]}

Seems fine ...

Nov 18 '24 14:11 vap0rtranz

@r0kk is correct. I did a test just now on a Qwen2-7B-Instruct model that I wanted to use for embedding. It is running on-prem in a separate ollama docker container. https://ollama.com/library/qwen2.5:7b

Steps I took to reproduce:

# to prep to execute a command in docker
docker exec -it ollama /bin/bash

# install curl inside docker
apt-get update
apt-get install -y curl

# the /embeddings endpoint returned an empty array
curl -X POST http://localhost:11434/api/embeddings   -H "Content-Type: application/json"   -d '{
    "model": "qwen2.5:7b",
    "input": ["Why is the sky blue?", "Why is the grass green?"]
  }'
> {"embedding":[]}

# but the /embed endpoint succeeded
curl -X POST http://localhost:11434/api/embed \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5:7b",
    "input": ["Why is the sky blue?", "Why is the grass green?"]
  }'

# output was {"model":"qwen2.5:7b","embeddings":
> [[-0.0009619982,0.011638082,0.0019495427,0.0025457952,-0.0059983553,0.0056869467,0.0049384357,0.0036752485,-0.0041196514,0.031427655,-0.010701078,0.0053785336,0.0060591 ......

The reason why the default nomic-embed-text works with the /embedding endpoint is just because it happens to use the old /embeddings endpoint as evidenced here: https://ollama.com/library/nomic-embed-text

Apr 01 '25 21:04 christopherkao

@vap0rtranz I think a difference is that the /embeddings endpoint expects a prompt, whereas /embed expects an input.

I think we both accidentally passed input when we should have passed prompt.

https://github.com/ollama/ollama/blob/main/docs/api.md#generate-embedding

When I tried to swap out the embedding model from nomic-embed-text to qwen2.5:7b running on ollama on-prem, I ran into this error for POST /v1/embeddings: "llm embedding error: Failed to create new sequence: no input provided"

This actually reminds me of the problem we are noticing here where the /embeddings endpoint expects a prompt, not an endpoint.

Apr 01 '25 22:04 christopherkao

Actually, I think we are all looking at the wrong endpoint. This other thread talks about the difference between /api/embeddings vs /v1/embeddings: https://github.com/ollama/ollama/issues/7242.

I think that Kotaemon is using the /v1, as evidenced in the screenshot at the top of this post.

Apr 01 '25 22:04 christopherkao