anything-llm icon indicating copy to clipboard operation
anything-llm copied to clipboard

Ollama embeddings "crash" when embedding a repo

Open Freffles opened this issue 9 months ago • 3 comments

Using Ollama embeddings nomic-embed-text on a github repo, embeddings stop after about 5 minutes. Anything LLM briefly displays an error message about 108 documents failed to load (there are 120 in the repo) then the message disappears. I managed to capture a screenshot (attached). There is no information available in the "event logs" within Anything LLM as theses appear to only deal with workspace documents added or removed. I have not been able to locate any other Anything LLM log to give any other information.

The vectorDC is LanceDB.

This has happened three times now with Anything LLM. FYI, the Ollama server log is attached.

I have successfully (but slowly) embedded this repo several times into a local FAISS DB using my own code.

image ollama_logs.txt

Freffles avatar May 11 '24 01:05 Freffles

A couple questions to help figure out where this is coming from

  • Did this happen on a workspace that had ever had anything embedded in it prior
  • were any of the documents to be embedded, already cached
  • did you ever use any embedder other than the ollama embedder with nomic?
  • Does this problem persist with a single document, or only when uploaded 100+?

timothycarambat avatar May 11 '24 04:05 timothycarambat

I do see this in the logs

time=2024-05-11T11:23:05.819+10:00 level=INFO source=routes.go:405 msg="embedding generation failed: no slots available after 10 retries"

This would indicate that a 0 length vector could be being returned due to a failure. That could be the issue and LanceDB is just say "i wont embed a 0 length vector when the dimensions should be 768"

timothycarambat avatar May 11 '24 04:05 timothycarambat

What would be a zero length vector? In regards to you questions:

I had the error on two workspaces. The first may have had embeddings with the default embedded but I deleted everything and started again. The most recent attempt was a clean workspace.

In terms of caching, I don't know. What I do know is that I retrieved the repo multiple times.

I did manage, I think, to embed the same repo with the default embeddings model.

I did get ollama embeddings to work with a much smaller repo that had only 3 files.

Freffles avatar May 11 '24 06:05 Freffles

The error from ollama also seems to indicate the ollama ran out of memory and started to fail to return embeddings. This would make sense because when we sent 108 files at once to ollama it probably crashed.

Going to close for now, as I think this might just be a usage issue since it does till work at lower volumes?

timothycarambat avatar May 11 '24 19:05 timothycarambat

Ok but ollama embeds this without failure outside of Anything LLM.

Freffles avatar May 11 '24 21:05 Freffles

How are you doing this outside of AnythingLLM? sending each document one at a time in order?

timothycarambat avatar May 12 '24 03:05 timothycarambat

FYI, also failed the same way with Chroma inside Anything LLM.

The attached dogs breakfast (Lang_main.py) is how I've done it with the same repo outside of Anything LLM with FAISS. Have also successfully (and very slowly) processed the much larger Autogen repo (~800 docs) the same way. The settings in my config file are pretty much boilerplate as I'm having trouble with speed of the embeddings (~20 minutes to embed the repo I'm trying on Anything LLM) so there is room for improvement.

config.txt

Lang_main.txt

Freffles avatar May 12 '24 08:05 Freffles

The error from ollama also seems to indicate the ollama ran out of memory and started to fail to return embeddings. This would make sense because when we sent 108 files at once to ollama it probably crashed.

Going to close for now, as I think this might just be a usage issue since it does till work at lower volumes?

I am gettign similar results.. if i send more than 5 files at a time to embeed.. i get a similar issue with an intel PC & Nvidia card, Pc is only 2 year sold with a nift new nvidia 4070 super in it.. The computer reboots like the "reset" switch was hit.

LindsayRex avatar Jun 06 '24 11:06 LindsayRex

The idea that somehow documents are NOT sent in sequence is baffling. I have not pulled this string all the way yet. I suspect there may be some "documents" in that repo that don't work with the one size fits all splitting and chunking that would be used by Anything LLM. I will note the following for now:

Importing a GitHub repo includes a lot of crap you don't need. In anything LLM you can "exclude" files and folders. When I looked at what was in the repo and excluded stuff I didn't need (reducing the file count to 68), it worked. I don't think the issue is the number of files, that's just the easiest thing to blame. As Anything LLM users, there are a lot of things we can't control and don't have visibility of. When I tried the entire repo of 112 files: Fail. When I eliminated files that I didn't need and embedded 68 files: Pass. Outside of Anything LLM embedding the ENTIRE repo works without the need to remove any files.

Ollama embeddings, using nomic-embed-text, are faster inside Anything LLM as compared to outside for the exact same set of documents. I have not looked into this for a while but I do intend to come back to this and run some experiments to gather some more information. There was, and maybe still is, an issue with Ollama not using GPU (or enough GPU) when doing the embeddings.

Freffles avatar Jun 07 '24 22:06 Freffles