anything-llm icon indicating copy to clipboard operation
anything-llm copied to clipboard

[BUG]: can't upload .pdf or .docx files that are only 2~3MB with ollama embedding

Open zxjhellow2 opened this issue 10 months ago • 1 comments

How are you running AnythingLLM?

Docker (remote machine)

What happened?

Image

Are there known steps to reproduce?

some settings:

Image

Image

the steps to run AnythingLLM export STORAGE_LOCATION=/home/zhaojh/anythingllm &&
mkdir -p $STORAGE_LOCATION &&
touch "$STORAGE_LOCATION/.env"

sudo docker run -d -p 3001:3001
--cap-add SYS_ADMIN
-v ${STORAGE_LOCATION}:/app/server/storage
-v ${STORAGE_LOCATION}/.env:/app/server/.env
-e STORAGE_DIR="/app/server/storage"
mintplexlabs/anythingllm

and then run the docker container docker run -d
--name anythingllm
--add-host=host.docker.internal:host-gateway
--env STORAGE_DIR=/app/server/storage
--health-cmd "/bin/bash /usr/local/bin/docker-healthcheck.sh || exit 1"
--health-interval 60s
--health-start-period 60s
--health-timeout 10s
-p 3001:3001/tcp
--restart=always
--user anythingllm
-v ${STORAGE_LOCATION}:/app/server/storage
-v ${STORAGE_LOCATION}/.env:/app/server/.env
-w /app
mintplexlabs/anythingllm

zxjhellow2 avatar Feb 20 '25 09:02 zxjhellow2

I try the other embedding models,it is solved. I really want to know the principles for choosing embedding models.

zxjhellow2 avatar Feb 20 '25 11:02 zxjhellow2

I try the other embedding models,it is solved. I really want to know the principles for choosing embedding models. Today I tried to use nomic-embed-text:latest as an embedding model in ollama, after uploading some of the data. Continuing to upload the file caused the same problem.

Image

Image

zxjhellow2 avatar Feb 21 '25 02:02 zxjhellow2

I've tried other files and I've noticed that some files will upload and some won't, is there a requirement for the file?

zxjhellow2 avatar Feb 21 '25 02:02 zxjhellow2

It would appear that if that specific model is not working that the result object is incorrect. Are you able to embed text via Ollama's API directly - it appears the response object is not typical from the embedding model's output. I have not used that model before so i am not aware of its nuance

timothycarambat avatar Feb 21 '25 07:02 timothycarambat

It would appear that if that specific model is not working that the result object is incorrect. Are you able to embed text via Ollama's API directly - it appears the response object is not typical from the embedding model's output. I have not used that model before so i am not aware of its nuance如果特定模型不起作用,则结果对象似乎不正确。你能直接通过Ollama的API嵌入文本吗?从嵌入模型的输出来看,响应对象并不典型。我以前没有用过这个模型,所以我不知道它的细微差别

Ok, I'll try to embed the quiz with ollama's api afterward. Thank you

zxjhellow2 avatar Feb 21 '25 07:02 zxjhellow2