anything-llm
anything-llm copied to clipboard
Ollama embeddings "crash" when embedding a repo
Using Ollama embeddings nomic-embed-text on a github repo, embeddings stop after about 5 minutes. Anything LLM briefly displays an error message about 108 documents failed to load (there are 120 in the repo) then the message disappears. I managed to capture a screenshot (attached). There is no information available in the "event logs" within Anything LLM as theses appear to only deal with workspace documents added or removed. I have not been able to locate any other Anything LLM log to give any other information.
The vectorDC is LanceDB.
This has happened three times now with Anything LLM. FYI, the Ollama server log is attached.
I have successfully (but slowly) embedded this repo several times into a local FAISS DB using my own code.
A couple questions to help figure out where this is coming from
- Did this happen on a workspace that had ever had anything embedded in it prior
- were any of the documents to be embedded, already cached
- did you ever use any embedder other than the ollama embedder with nomic?
- Does this problem persist with a single document, or only when uploaded 100+?
I do see this in the logs
time=2024-05-11T11:23:05.819+10:00 level=INFO source=routes.go:405 msg="embedding generation failed: no slots available after 10 retries"
This would indicate that a 0 length vector could be being returned due to a failure. That could be the issue and LanceDB is just say "i wont embed a 0 length vector when the dimensions should be 768"
What would be a zero length vector? In regards to you questions:
I had the error on two workspaces. The first may have had embeddings with the default embedded but I deleted everything and started again. The most recent attempt was a clean workspace.
In terms of caching, I don't know. What I do know is that I retrieved the repo multiple times.
I did manage, I think, to embed the same repo with the default embeddings model.
I did get ollama embeddings to work with a much smaller repo that had only 3 files.
The error from ollama also seems to indicate the ollama ran out of memory and started to fail to return embeddings. This would make sense because when we sent 108 files at once to ollama it probably crashed.
Going to close for now, as I think this might just be a usage issue since it does till work at lower volumes?
Ok but ollama embeds this without failure outside of Anything LLM.
How are you doing this outside of AnythingLLM? sending each document one at a time in order?
FYI, also failed the same way with Chroma inside Anything LLM.
The attached dogs breakfast (Lang_main.py) is how I've done it with the same repo outside of Anything LLM with FAISS. Have also successfully (and very slowly) processed the much larger Autogen repo (~800 docs) the same way. The settings in my config file are pretty much boilerplate as I'm having trouble with speed of the embeddings (~20 minutes to embed the repo I'm trying on Anything LLM) so there is room for improvement.
The error from ollama also seems to indicate the ollama ran out of memory and started to fail to return embeddings. This would make sense because when we sent 108 files at once to ollama it probably crashed.
Going to close for now, as I think this might just be a usage issue since it does till work at lower volumes?
I am gettign similar results.. if i send more than 5 files at a time to embeed.. i get a similar issue with an intel PC & Nvidia card, Pc is only 2 year sold with a nift new nvidia 4070 super in it.. The computer reboots like the "reset" switch was hit.
The idea that somehow documents are NOT sent in sequence is baffling. I have not pulled this string all the way yet. I suspect there may be some "documents" in that repo that don't work with the one size fits all splitting and chunking that would be used by Anything LLM. I will note the following for now:
Importing a GitHub repo includes a lot of crap you don't need. In anything LLM you can "exclude" files and folders. When I looked at what was in the repo and excluded stuff I didn't need (reducing the file count to 68), it worked. I don't think the issue is the number of files, that's just the easiest thing to blame. As Anything LLM users, there are a lot of things we can't control and don't have visibility of. When I tried the entire repo of 112 files: Fail. When I eliminated files that I didn't need and embedded 68 files: Pass. Outside of Anything LLM embedding the ENTIRE repo works without the need to remove any files.
Ollama embeddings, using nomic-embed-text, are faster inside Anything LLM as compared to outside for the exact same set of documents. I have not looked into this for a while but I do intend to come back to this and run some experiments to gather some more information. There was, and maybe still is, an issue with Ollama not using GPU (or enough GPU) when doing the embeddings.