anything-llm icon indicating copy to clipboard operation
anything-llm copied to clipboard

[BUG]: Ollama LLM + Embedder

Open timothycarambat opened this issue 1 year ago • 6 comments

How are you running AnythingLLM?

AnythingLLM desktop app

What happened?

When I have Ollama set as both my LLM and embedder model it seems that sending chats results in a bug where Ollama cannot be used for both services.

This is each preference setting pointing to the same Ollama instance.

IMG_0964

Are there known steps to reproduce?

Set Ollama as LLM and embedder. Embed documents. Send Chat

timothycarambat avatar Mar 15 '24 20:03 timothycarambat

Currently unable to replicate using llama2 and nomic-embed as the LLM and embedder respectively.

timothycarambat avatar Mar 15 '24 20:03 timothycarambat

Hi @timothycarambat I use Anything LLM + Ollama as the LLM Model and Embedding model but then I got an error say: No embedding model was set.

Did you get this error? How did you fix it?

franva avatar Mar 17 '24 01:03 franva

Good news bad news

The ollama auto loader doesn't seem to kick occasionally when called first through an embedding before a language model is called

If you manually run Serve through ollama terminal or through a text engagement through web UI then it will process normally

Bad news is that the normic embedding is so fast that it blue screens my machine with DPC watchdog error for PDFs over 100mb with 5k pages. This is upserting into pinecone

I don't want to start manually splitting if there's a way to put in a delay buffer slider or slow it down somehow that would be great

FarVision2 avatar Mar 17 '24 03:03 FarVision2

You can simply watch the logs in each docker container to see what's going on

Anything hits the embedding API for services just fine it just doesn't actually trigger loading of the model when used

FarVision2 avatar Mar 17 '24 03:03 FarVision2

We just use the Ollama API, i dont think there is even an API to "ask" Ollama to swap over. It is supposed to just do that automatically to load the model. I wonder is mlock is enabled and it cannot load the next model because it has once already loaded.

You are not serving ollama with any special ENVs or anything, right?

timothycarambat avatar Mar 18 '24 15:03 timothycarambat

No nothing different at all. I tap the same API with DifyCE and Open-WebUI. Different models swap in and out as expected. I used the llama2 as language and embedding just for grims but of course it's not so great for embedding. I bounced the Docker a few times and everything kicks on and off as expected every single time except for Anything. :( I was trying to shake it out of the logs but after the GET "/api/tags" it didn't actually process the workflow. I will poke around some more today.

FarVision2 avatar Mar 18 '24 17:03 FarVision2