anything-llm icon indicating copy to clipboard operation
anything-llm copied to clipboard

Failed to upload files in workspace with failed to embeed error

Open stanltam opened this issue 1 year ago • 4 comments

LLM: ollama:latest Embedding: AnythingLLM Embedder Error Log:

[TELEMETRY SENT] { event: 'documents_embedded_in_workspace', distinctId: '4aea0721-0b3f-4014-81dd-d78c248f6b7d', properties: { LLMSelection: 'ollama', Embedder: 'native', VectorDbSelection: 'lancedb' } } -- Working testing.pdf -- -- Parsing content from pg 1 -- -- Parsing content from pg 2 -- -- Parsing content from pg 3 -- -- Parsing content from pg 4 -- [SUCCESS]: testing.pdf converted & ready for embedding.

Document testing.pdf uploaded processed and successfully. It is now available in documents. [TELEMETRY SENT] { event: 'document_uploaded', distinctId: '4aea0721-0b3f-4014-81dd-d78c248f6b7d', properties: {} } Adding new vectorized document into namespace test Chunks created from document: 11 [INFO] The native embedding model has never been run and will be downloaded right now. Subsequent runs will be faster. (~23MB)

Failed to load the native embedding model: TypeError: fetch failed at Object.fetch (node:internal/deps/undici/undici:11730:11) at processTicksAndRejections (node:internal/process/task_queues:95:5) at runNextTicks (node:internal/process/task_queues:64:3) at process.processImmediate (node:internal/timers:447:9) at async getModelFile (file:///app/server/node_modules/@xenova/transformers/src/utils/hub.js:470:24) at async getModelJSON (file:///app/server/node_modules/@xenova/transformers/src/utils/hub.js:574:18) at async Promise.all (index 0) at async loadTokenizer (file:///app/server/node_modules/@xenova/transformers/src/tokenizers.js:52:16) at async AutoTokenizer.from_pretrained (file:///app/server/node_modules/@xenova/transformers/src/tokenizers.js:3920:48) at async Promise.all (index 0) { cause: ConnectTimeoutError: Connect Timeout Error at onConnectTimeout (node:internal/deps/undici/undici:6869:28) at node:internal/deps/undici/undici:6825:50 at Immediate._onImmediate (node:internal/deps/undici/undici:6857:13) at process.processImmediate (node:internal/timers:476:21) { code: 'UND_ERR_CONNECT_TIMEOUT' } } TypeError: fetch failed at Object.fetch (node:internal/deps/undici/undici:11730:11) at processTicksAndRejections (node:internal/process/task_queues:95:5) at runNextTicks (node:internal/process/task_queues:64:3) at process.processImmediate (node:internal/timers:447:9) at async getModelFile (file:///app/server/node_modules/@xenova/transformers/src/utils/hub.js:470:24) at async getModelJSON (file:///app/server/node_modules/@xenova/transformers/src/utils/hub.js:574:18) at async Promise.all (index 0) at async loadTokenizer (file:///app/server/node_modules/@xenova/transformers/src/tokenizers.js:52:16) at async AutoTokenizer.from_pretrained (file:///app/server/node_modules/@xenova/transformers/src/tokenizers.js:3920:48) at async Promise.all (index 0) { cause: ConnectTimeoutError: Connect Timeout Error at onConnectTimeout (node:internal/deps/undici/undici:6869:28) at node:internal/deps/undici/undici:6825:50 at Immediate._onImmediate (node:internal/deps/undici/undici:6857:13) at process.processImmediate (node:internal/timers:476:21) { code: 'UND_ERR_CONNECT_TIMEOUT' } } addDocumentToNamespace fetch failed Failed to vectorize custom-documents/testing.pdf-36edd41c-d784-454f-a885-a4b73a558ce7.json [TELEMETRY SENT] { event: 'documents_embedded_in_workspace', distinctId: '4aea0721-0b3f-4014-81dd-d78c248f6b7d', properties: { LLMSelection: 'ollama', Embedder: 'native', VectorDbSelection: 'lancedb' } }

stanltam avatar Jan 06 '24 01:01 stanltam

@stanltam Are you running AnythingLLM in a docker container on a MacBook with an M-series chip?

timothycarambat avatar Jan 06 '24 02:01 timothycarambat

i'm the same error ,i run AnythingLLM in a docker container on a ubuntu system

bioone avatar Jan 06 '24 03:01 bioone

@bioone This error is not AnythingLLM. Its a timeout on xenova/transformers downloading from huggingface. If you are having trouble embedding with the native embedder then you can swap to any other provider and re-embed documents.

Will look into if there is a way we can either pre-bake the native embedder into the image or increase the timeout

timothycarambat avatar Jan 06 '24 03:01 timothycarambat

@timothycarambat OK, I hope you can pre-bake the native embedder, it's very useful!!!

bioone avatar Jan 06 '24 11:01 bioone

Hi there! Any updates on the issue? I still encounter the problem to embed PDF files in workspace (using LanceDB). I'm using LM Studio with Anything LLM...

Rainmanqxy avatar Apr 02 '24 14:04 Rainmanqxy

I deployed using Docker, but encountered an error when importing files, indicating a timeout.

[INFO] The native embedding model has never been run and will be downloaded right now. Subsequent runs will be faster. (~23MB)


Failed to load the native embedding model: TypeError: fetch failed
    at Object.fetch (node:internal/deps/undici/undici:11731:11)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async getModelFile (file:///app/server/node_modules/@xenova/transformers/src/utils/hub.js:471:24)
    at async getModelJSON (file:///app/server/node_modules/@xenova/transformers/src/utils/hub.js:575:18)
    at async Promise.all (index 0)
    at async loadTokenizer (file:///app/server/node_modules/@xenova/transformers/src/tokenizers.js:61:18)
    at async AutoTokenizer.from_pretrained (file:///app/server/node_modules/@xenova/transformers/src/tokenizers.js:4296:50)
    at async Promise.all (index 0)
    at async loadItems (file:///app/server/node_modules/@xenova/transformers/src/pipelines.js:3115:5)
    at async pipeline (file:///app/server/node_modules/@xenova/transformers/src/pipelines.js:3055:21) {
  cause: ConnectTimeoutError: Connect Timeout Error
      at onConnectTimeout (node:internal/deps/undici/undici:6869:28)
      at node:internal/deps/undici/undici:6825:50
      at Immediate._onImmediate (node:internal/deps/undici/undici:6857:13)
      at process.processImmediate (node:internal/timers:476:21) {
    code: 'UND_ERR_CONNECT_TIMEOUT'
  }
}
addDocumentToNamespace fetch failed

warrior-dl avatar Apr 03 '24 06:04 warrior-dl

@warrior-dl see pinned issue. HF is blocking your IP https://github.com/Mintplex-Labs/anything-llm/issues/821

timothycarambat avatar Apr 03 '24 06:04 timothycarambat

@warrior-dl see pinned issue. HF is blocking your IP #821

Thank you very much, I successfully imported the file after manually downloading the model.

warrior-dl avatar Apr 03 '24 06:04 warrior-dl

@warrior-dl Hi, could you please tell me how to manually download the embedding model? Where should the model be downloaded?

ZiHAO-LI-cmd avatar Apr 09 '24 07:04 ZiHAO-LI-cmd

@warrior-dl Hi, could you please tell me how to manually download the embedding model? Where should the model be downloaded?

https://github.com/Mintplex-Labs/anything-llm/issues/821#issuecomment-1968382359

warrior-dl avatar Apr 19 '24 10:04 warrior-dl