anything-llm icon indicating copy to clipboard operation
anything-llm copied to clipboard

115 Illegal instruction

Open lishaojun616 opened this issue 1 year ago • 19 comments

The anythingllm is installed in Ubuntu server. In the system LLM set ,the system can connect to the Ollama server and get the models . But when chat in workspace ,the docker is exited. 1.Show the info in browser: image 2.and the docker logs: "/usr/local/bin/docker-entrypoint.sh: line 7: 115 Illegal instruction (core dumped) node /app/server/index.js"

What's the problem?

lishaojun616 avatar May 10 '24 05:05 lishaojun616

Docker engine - it appears https://github.com/Mintplex-Labs/anything-llm/issues/1290#issuecomment-2101960232

That issue is specifically occurring on Mac, but it is the same on Linux/Ubuntu as well.

timothycarambat avatar May 10 '24 17:05 timothycarambat

Hi @lishaojun616 , have you managed to resolve the issue? I am experiencing the same situation as you described in #1323 .

SyuanYo avatar May 14 '24 10:05 SyuanYo

Hello everyone,

I face the same problem.

Same setup, ubuntu 22.04 LTS, using ollama as llm.

I have installed the newest docker engine, build anything-llm with docker-compose-v2.

...
[TELEMETRY SENT] {
  event: 'workspace_created',
  distinctId: 'c060354a-b171-4702-b83a-9da2ef0612e4',
  properties: {
    multiUserMode: false,
    LLMSelection: 'ollama',
    Embedder: 'native',
    VectorDbSelection: 'lancedb',
    runtime: 'docker'
  }
}
[Event Logged] - workspace_created
[TELEMETRY SENT] {
  event: 'onboarding_complete',
  distinctId: 'c060354a-b171-4702-b83a-9da2ef0612e4',
  properties: { runtime: 'docker' }
}
[NativeEmbedder] Initialized
/usr/local/bin/docker-entrypoint.sh: line 7:   119 Illegal instruction     (core dumped) node /app/server/index.js
gitlab-runner@gradio:~$ docker --version
Docker version 26.1.3, build b72abbb

This issue is marked as closed. Is there a solution available?

Best regrads

Joachim

joachimt-git avatar May 20 '24 06:05 joachimt-git

Encountering the same issue. Using Ubuntu Server 22.04 with Docker, Yarn, and Node installed as recommended in HOW_TO_USE_DOCKER.md#how-to-use-dockerized-anything-llm. Ollama is on another machine, serving at 0.0.0.0 (other remote apps function correctly with this setup, even in Docker).

EDIT: Forgot to mention:

Docker version 26.1.3, build b72abbb

Ubuntu is running in a VM

Experiencing the identical error as posted by @joachimt-git:

[Event Logged] - update_llm_provider
[Event Logged] - update_embedding_engine
[Event Logged] - update_vector_db
[TELEMETRY SENT] {
  event: 'enabled_multi_user_mode',
  distinctId: '20b022d1-14cc-490c-ab86-4f941a32f7bc',
  properties: { multiUserMode: true, runtime: 'docker' }
}
[Event Logged] - multi_user_mode_enabled
[TELEMETRY SENT] {
  event: 'login_event',
  distinctId: '20b022d1-14cc-490c-ab86-4f941a32f7bc::1',
  properties: { multiUserMode: false, runtime: 'docker' }
}
[Event Logged] - login_event
[TELEMETRY SENT] {
  event: 'workspace_created',
  distinctId: '20b022d1-14cc-490c-ab86-4f941a32f7bc::1',
  properties: {
    multiUserMode: true,
    LLMSelection: 'ollama',
    Embedder: 'native',
    VectorDbSelection: 'lancedb',
    runtime: 'docker'
  }
}
[Event Logged] - workspace_created
[TELEMETRY SENT] {
  event: 'onboarding_complete',
  distinctId: '20b022d1-14cc-490c-ab86-4f941a32f7bc',
  properties: { runtime: 'docker' }
}
[NativeEmbedder] Initialized
/usr/local/bin/docker-entrypoint.sh: line 7:   117 Illegal instruction     (core dumped) node /app/server/index.js

Any suggestions for resolving this?

How can I assist?

Thanks

xsn-cloud avatar May 20 '24 16:05 xsn-cloud

This is certainly a configuration issue. Considering all is well until the native embedder is called this might be arch related - but we support both ARM and x86. Regardless, here is my exact steps that fail to repro:

  1. Obtain Ubuntu 22.04 LTS AWS instance - used t3.small - x86
  2. curl -fsSL https://get.docker.com -o get-docker.sh
  3. sudo sh get-docker.sh
  4. sudo usermod -aG docker $USER
  5. docker -v

Docker version 26.1.3, build b72abbb

  1. docker pull mintplexlabs/anythingllm

Run:

export STORAGE_LOCATION=$HOME/anythingllm && \
mkdir -p $STORAGE_LOCATION && \
touch "$STORAGE_LOCATION/.env" && \
docker run -d -p 3001:3001 \
--cap-add SYS_ADMIN \
-v ${STORAGE_LOCATION}:/app/server/storage \
-v ${STORAGE_LOCATION}/.env:/app/server/.env \
-e STORAGE_DIR="/app/server/storage" \
mintplexlabs/anythingllm

Access via instance IP on port 3001 - I get the interface, onboard, create workspace, and upload documents.

[Event Logged] - workspace_created
[TELEMETRY SENT] {
  event: 'onboarding_complete',
  distinctId: 'fe73dd18-d52e-4a4c-bb62-a6adba7491d1',
  properties: { runtime: 'docker' }
}
-- Working readme.pdf --
-- Parsing content from pg 1 --
-- Parsing content from pg 2 --
-- Parsing content from pg 3 --
-- Parsing content from pg 4 --
-- Parsing content from pg 5 --
[SUCCESS]: readme.pdf converted & ready for embedding.

[CollectorApi] Document readme.pdf uploaded processed and successfully. It is now available in documents.
[TELEMETRY SENT] {
  event: 'document_uploaded',
  distinctId: 'fe73dd18-d52e-4a4c-bb62-a6adba7491d1',
  properties: { runtime: 'docker' }
}
[Event Logged] - document_uploaded
Adding new vectorized document into namespace sample
[NativeEmbedder] Initialized
[RecursiveSplitter] Will split with { chunkSize: 1000, chunkOverlap: 20 }
Chunks created from document: 14
[NativeEmbedder] The native embedding model has never been run and will be downloaded right now. Subsequent runs will be faster. (~23MB)
[NativeEmbedder] Downloading Xenova/all-MiniLM-L6-v2 from https://huggingface.co/
....truncated
[NativeEmbedder - Downloading model] onnx/model_quantized.onnx 100%
[NativeEmbedder] Embedded Chunk 1 of 1
Inserting vectorized chunks into LanceDB collection.
Caching vectorized results of custom-documents/readme.pdf-d717ca8c-6ac0-4514-8d6d-94ac48760afe.json to prevent duplicated embedding.
[TELEMETRY SENT] {
  event: 'documents_embedded_in_workspace',
  distinctId: 'fe73dd18-d52e-4a4c-bb62-a6adba7491d1',
  properties: {
    LLMSelection: 'openai',
    Embedder: 'native',
    VectorDbSelection: 'lancedb',
    runtime: 'docker'
  }
}
[Event Logged] - workspace_documents_added

Considering all of this occurs on [NativeEmbedder] Initialized this to me would indicate a lack of resources to run the local embedder and if that is the case, you should allocate more resources to the container or use another embedder. That is the only way I could imagine a full core dump or illegal instruction occurring with Illegal instruction. Either that or the underlying chip arch is not found/supported for Xenova transformers.js.

timothycarambat avatar May 20 '24 17:05 timothycarambat

Hi Timothy,

I think what @xsn-cloud and I have in commun is that we both use ollama.

May that ne the cause of the failure?

Joachim

joachimt-git avatar May 20 '24 18:05 joachimt-git

It would not, since the exception is in the AnythingLLM container and if there was an illegal instruction in the Ollama program it would throw in that container/program. All AnythingLLM does is execute a fetch request to the Ollama instance, which would be permitted in any container

timothycarambat avatar May 20 '24 19:05 timothycarambat

@timothycarambat Thanks for addressing this issue. Please let me know if there's anything I can assist you with.

I've conducted the following experiment, also considering that it might be an issue with Docker running on VMs and to verify resource issues:

UPDATE: Also tested it in Windows 10 (WSL, Docker for Windows, Docker version 26.1.1, build 4cf5afa): Same issue

  • Clean install of Debian 12 on baremetal - Dual Xeon E5-2650 v2 @ 2.60GHz with 96GB of RAM
  • Docker version 26.1.3, build b72abbb
  • Followed the procedure exactly as you did in your previous comment (the same one you used on an Ubuntu 22.04 LTS AWS)
  • No documents loaded
  • During the onboarding, anything-llm successfully communicates with ollama, accurately retrieves the installed models, and allows the selection of the model without issues.
  • Model selected: llama3 with 4K context window. Other models tested with same results.

This is the outcome after the onboarding when attempting to send a "hello" in a new chat. (docker logs -f [containerid]).

Please note that the container was restarted from scratch to ensure the clarity of the logs.

Collector hot directory and tmp storage wiped!
Document processor app listening on port 8888
Environment variables loaded from .env
Prisma schema loaded from prisma/schema.prisma

✔ Generated Prisma Client (v5.3.1) to ./node_modules/@prisma/client in 338ms

Start using Prisma Client in Node.js (See: https://pris.ly/d/client)

'''
import { PrismaClient } from '@prisma/client'
const prisma = new PrismaClient()
'''


or start using Prisma Client at the edge (See: https://pris.ly/d/accelerate)
'''
import { PrismaClient } from '@prisma/client/edge'
const prisma = new PrismaClient()
'''

See other ways of importing Prisma Client: http://pris.ly/d/importing-client

Environment variables loaded from .env
Prisma schema loaded from prisma/schema.prisma
Datasource "db": SQLite database "anythingllm.db" at "file:../storage/anythingllm.db"

20 migrations found in prisma/migrations


No pending migrations to apply.
┌─────────────────────────────────────────────────────────┐
│  Update available 5.3.1 -> 5.14.0                       │
│  Run the following to update                            │
│    npm i --save-dev prisma@latest                       │
│    npm i @prisma/client@latest                          │
└─────────────────────────────────────────────────────────┘
[TELEMETRY ENABLED] Anonymous Telemetry enabled. Telemetry helps Mintplex Labs Inc improve AnythingLLM.
prisma:info Starting a sqlite pool with 33 connections.
fatal: not a git repository (or any of the parent directories): .git
getGitVersion Command failed: git rev-parse HEAD
fatal: not a git repository (or any of the parent directories): .git

[TELEMETRY SENT] {
  event: 'server_boot',
  distinctId: '4f39e3fb-ac8c-4043-9586-c21ef46b0c47',
  properties: { commit: '--', runtime: 'docker' }
}
[CommunicationKey] RSA key pair generated for signed payloads within AnythingLLM services.
Primary server in HTTP mode listening on port 3001
[NativeEmbedder] Initialized
/usr/local/bin/docker-entrypoint.sh: line 7:   163 Illegal instruction     (core dumped) node /app/server/index.js

One more clarification: The error occurs after sending a message in the chatbox. Until then, the last message displayed is [NativeEmbedder] Initialized, and it remains unchanged until the message is sent.

Thanks a lot for your time.

xsn-cloud avatar May 21 '24 09:05 xsn-cloud

If you were to not use the native embedder, this problem would not surface. The only commonality between all of this is varying CPUs. Transformers.js which runs the native embedder, uses ONNX runtime and at this point the root cause has to be coming from there as this only occurs when using the native embedder and that is the supporting libraries to enable that functionality.

timothycarambat avatar May 21 '24 15:05 timothycarambat

I had the same issue running Docker in Ubuntu 24.04 VM on a Proxmox host. I switched the CPU in the guest to "host," and it fixed the problem. Just wanted to share in case anyone else is having the same struggle I did. Hope this helps!

jorgen-k avatar May 23 '24 12:05 jorgen-k

Does the CPU you swapped to support AVXv2?

timothycarambat avatar May 23 '24 15:05 timothycarambat

No, my CPU does not support AVX2 however it supports AVX

computersrmyfriends avatar May 23 '24 17:05 computersrmyfriends

Does the CPU you swapped to support AVXv2?

I am sorry, i do not know how to check that, i just changed o "host" being a Intel Core i9-9900K CPU

jorgen-k avatar May 23 '24 17:05 jorgen-k

At this time, the working hypothesis is that since Transformers.js uses ONNX runtime it will fail to execute any model (including the built in embedder) if AVX2 is not supported https://github.com/microsoft/onnxruntime

@jorgen-k https://www.intel.com/content/www/us/en/products/sku/186605/intel-core-i99900k-processor-16m-cache-up-to-5-00-ghz/specifications.html

Instruction Set Extensions Intel® SSE4.1, Intel® SSE4.2, Intel® AVX2

Supports AVX2

timothycarambat avatar May 23 '24 18:05 timothycarambat

I am using kvm as hyperviser and the virtual cpu doesn't support avx2. When I configure a passthrough of the cpu (which supports avx2) as @jorgen-k has suggested it works for me as well.

joachimt-git avatar May 24 '24 06:05 joachimt-git

I ran out of luck for the AVX cpu. It's a XEON 2660 but only supports AVX. Had to find another machine.

computersrmyfriends avatar May 24 '24 08:05 computersrmyfriends

I had the same issue with "/usr/local/bin/docker-entrypoint.sh: line 7: 115 Illegal instruction (core dumped) node /app/server/index.js". It seems the new Docker images of AnythingLLM have some issues, possibly on older systems. To fix the issue, I tried using older Docker versions and previous AnythingLLM images. While the older Docker versions did not resolve the issue, the older AnythingLLM images worked great. The newest working version for me was "sha256:1d994f027b5519d4bc5e1299892e7d0be1405308f10d0350ecefc8e717d3154f". You can find it here: https://github.com/Mintplex-Labs/anything-llm/pkgs/container/anything-llm/209298508

Running on Centos7 Linux with (CWP7), 2X Intel(R) Xeon(R) CPU E5-2680 v2, 2X Nvidia 2080TI GPUs

Smocvin avatar May 24 '24 23:05 Smocvin

@Smocvin, excellent work. Okay, then that pretty much nails down commit ca63012c0f569ad775b6fd22b9b7965d61812d77 as the issue commit. In that commit we moved from lancedb 0.1.19 to 0.4.11 (which is what we use on desktop version).

However, given how this issue seems to only be a problem with certain CPUs we have two choices: Bump to 0.5.0 and see if that fixes it or roll back to 0.1.19. Given how we do not leverage or dive deep into LanceDBs API much, the code change is quite minimal or none.

What I will need though is some help from the community as I do not have a single machine, VM, or instance that I can replicate this bug with. So my ask is:

  • If you are getting this bug, and it is on a Cloud-container service. What service and instance specs are you using so we can provision a test instance for replicate and debugging.

or

If anyone is willing to help debug the hard way I am going to creat two new tags on docker :lancedb_bump :lancedb_revert and I would need someone suffering from this issue to pull both and see which works.

Obviously if we can bump up, that would be ideal, but I would rather not field this issue for the rest of time since lancedb should just work.

Links to images

lancedb_bump: docker pull mintplexlabs/anythingllm:lancedb_bump https://hub.docker.com/layers/mintplexlabs/anythingllm/lancedb_bump/images/sha256-40b0b28728d1bb481f01e510e96351a1970ac3fafafe4b2641cb264f0e7f8a93?context=repo

lancedb_revert: docker pull mintplexlabs/anythingllm:lancedb_revert https://hub.docker.com/layers/mintplexlabs/anythingllm/lancedb_revert/images/sha256-f6a8d37a305756255302a8883e445056e1ab2f9ecf301f7c542685689436685d?context=repo

timothycarambat avatar May 25 '24 01:05 timothycarambat

Can repro with a basic cloud instance on Vultr with the following specs: Cloud Compute - Shared CPU, Ubuntu 22.04 LTS x64, Intel High Performance, 25 GB NVMe, 1 vCPU, 1 GB Ram.

Then I basically just:

export STORAGE_LOCATION=$HOME/anything-llm
docker run -d -p 3001:3001 --cap-add SYS_ADMIN -v ${STORAGE_LOCATION}:/app/server/storage -v ${STORAGE_LOCATION}/.env:/app/server/.env -e STORAGE_DIR="/app/server/storage" mintplexlabs/anythingllm

Configured with OpenAI / lancedb. At that point, just tried any chat eg. typed 'hello' and then it hangs for a bit and comes up with the error message shown above and I can see the docker container died with the log:

/usr/local/bin/docker-entrypoint.sh: line 7:   102 Illegal instruction     (core dumped) node /app/server/index.js

acote88 avatar May 25 '24 07:05 acote88

I'm happy to help debug here locally with the newly created image tags when available. I have two machines I can test on here with AVX (Debian docker) and AVX2 (Windows docker desktop). I get the core dump on the AVX machine with :latest but the AVX2 machine runs the container fine so I can provide output from both of them if needed.

Dozer316 avatar May 28 '24 04:05 Dozer316

@Dozer316 @acote88 @computersrmyfriends can any of you who have this issue on the master/latest image check and see if lancedb_bump or lancedb_revert work on the impact machine?

Hopefully the _bump image works, otherwise we are in for some pain, but at least we can debug from there. I am friends with the LanceDB team so I can escalate to them if the issue persists.

timothycarambat avatar May 29 '24 08:05 timothycarambat

Hey there - revert has solved the problem on the impacted machine, bump still core dumps unfortunately.

Thanks for taking a look at this for us.

Dozer316 avatar May 29 '24 09:05 Dozer316

Same here. _revert works, _bump crashes. Cheers.

acote88 avatar May 29 '24 09:05 acote88

Results of the test:

lancedb_bump: Crashes lancedb_revert: Works

Notes:

CPU: AVX only Testing: Tested with local documents; works perfectly.

(edited: several typos, sorry)

xsn-cloud avatar May 29 '24 11:05 xsn-cloud

Thank you @Dozer316 @acote88 @xsn-cloud for all taking the time to test both, which is very tedious. Ill contact the lancedb team as well as see if we can rollback the docker vectordb package in the interim.

timothycarambat avatar May 30 '24 00:05 timothycarambat

I just closed my report out #1618 because it was caused by the same thing. AVX was not a flag on the virtual CPU.

I set the virtual CPU to pass through and it solved the issues.

Thank you @xsn-cloud

cyberlink1 avatar Jun 06 '24 11:06 cyberlink1

Okay, so the reason this issue occurs is due to LanceDB having its minimum target of haswell as of version ~0.20. This is because performance on AVX2 is just much better.

So right now there are two options to go around this:

  • Upgrade or migrate use to a CPU that supports AVX2
  • We can maintain an image that works with the older vectordb package. Truthfully, I really dont want to do that and we have to draw the line somewhere. The reason I don't is that while code-change is minor this likely will become increasingly burdensome to maintain as we continue to bump lance into later versions. Knowing where the issue lies though is very useful.

Either way, the root cause is the requirement of the underlying CPU to have AVX2. Closing currently as wontfix but discussion is still open for any more commentary.

timothycarambat avatar Jun 06 '24 21:06 timothycarambat

Thanks for following up on this Timothy. In case this can help others, I compared 2 types of instances on Vultr. One called "High Performance" and the other one called "High Frequency". The "High Frequency" one does support AVX2, while the other doesn't. You can check by running:

cat /proc/cpuinfo | grep -i avx

acote88 avatar Jun 07 '24 09:06 acote88

You have no idea how long I've had to search everywhere and how many reconfigurations and reinstalls I did before I found this thread. Could you MAYBE write SOMEWHERE that currently AnythingLLM requires an AVX2 CPU to work properly?

Nododot avatar Jun 14 '24 17:06 Nododot

Hello @timothycarambat

Thank you for publishing the Lancedb_revert image at all in the first place.

Currently, googling the error message took me to this thread , which in turn links to this one.

To resolve the issue all I had to do was update the docker run command with the lancedb_revert tag, and otherwise "off it went"

My pc is old, but it's what I've got and sadly upgrading just isn't on the cards any time soon - I'm grateful to have a way to try it out at all.

I appreciate it's unreasonable to put in ongoing effort for small subset of users running into incompatibility problems because they insist on using a relic from the before times - Especially since it's going to start increasingly cropping up elsewhere as well.

Having an image an in the first place is great, but it'd be nice if there was some way to "run out the clock" on updates until breaking changes inevitably came along.

Would it be possible to have an unsupported update that pins the version of lancedb in place, dumps latest and/or dev over the top of it and "When it breaks, that's the end of the ride.... May the odds be ever in your favour"?

When it does, ideally the docker image gets a 2nd "unsupported final build" release based on that point version and that's the end of that.

akrotor avatar Aug 01 '24 11:08 akrotor