LocalAI
LocalAI copied to clipboard
Can't start LocalAI (with REBUILD) on Xeon X5570 - Unwanted AVX dependency?
LocalAI version:
Using Docker image:
localai/localai:latest-aio-gpu-hipblas
Environment, CPU architecture, OS, and Version:
- Ubuntu 22.04
- Xeon X5570 Specs (No AVX at all)
- GPU Radeon 6700XT (
gfx1031but I've seen this to begfx1030compatible on a previous ROCm test) - installed ROCm driver per AMD instructions (version
6.2.60200)
Describe the bug
- I can't get LocalAI to start up, using the following Docker Compose stack file (inspired by ROCm Acceleration instructions):
version: '3.9'
services:
api:
image: localai/localai:latest-aio-gpu-hipblas
restart: no
ports:
- 8082:8080
volumes:
- /raid/AI/Models:/build/models:cached
environment:
- "DEBUG=true"
- "REBUILD=true"
- "CMAKE_ARGS=-DGGML_F16C=OFF -DGGML_AVX512=OFF -DGGML_AVX2=OFF -DGGML_AVX=OFF -DGGML_FMA=OFF -DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_AVX=OFF -DLLAMA_FMA=OFF"
- "BUILD_TYPE=hipblas"
- "GPU_TARGETS=gfx1030"
- "BUILD_PARALLELISM=16"
- devices:
- /dev/dri
- /dev/kfd
networks:
apollonet:
external: true
name: apollonet
To Reproduce
- Try to start the Docker container with the above configuration.
- The Docker container goes through a lengthy build process for several hours
- It looks like the rebuild completes(?) or gets a significant way through, then the Container quite with error code 132
- Exit 132 is SIGILL (132-128=4) which means illegal instruction. I suspect AVX is still being used by something. My CPU doesn't support AVX at all, but I am expecting to run all inference on the GPU so achieving acceleration on CPU should be unimportant(?)
Expected behavior LocalAI starts and serves an interface on the exposed host port 8082, ready to run inference on the AMD card.
Logs The last lines of the startup/rebuild log, before the exit, are:
cp llama.cpp/build/bin/grpc-server .
make[2]: Leaving directory '/build/backend/cpp/llama-fallback'
make[1]: Entering directory '/build'
/usr/bin/upx backend/cpp/llama-fallback/grpc-server
Ultimate Packer for eXecutables
Copyright (C) 1996 - 2020
UPX 3.96 Markus Oberhumer, Laszlo Molnar & John Reiser Jan 23rd 2020
File size Ratio Format Name
-------------------- ------ ----------- -----------
85962576 -> 17753404 20.65% linux/amd64 grpc-server
Packed 1 file.
make[1]: Leaving directory '/build'
cp -rfv backend/cpp/llama-fallback/grpc-server backend-assets/grpc/llama-cpp-fallback
'backend/cpp/llama-fallback/grpc-server' -> 'backend-assets/grpc/llama-cpp-fallback'
I local-ai build info:
I BUILD_TYPE: hipblas
I GO_TAGS:
I LD_FLAGS: -s -w -X "github.com/mudler/LocalAI/internal.Version=v2.19.3" -X "github.com/mudler/LocalAI/internal.Commit=86f8d5b50acd8fe88af4f537be0d42472772b928"
I UPX: /usr/bin/upx
CGO_LDFLAGS="-O3 --rtlib=compiler-rt -unwindlib=libgcc -lhipblas -lrocblas --hip-link -L/opt/rocm/lib/llvm/lib" go build -ldflags "-s -w -X "github.com/mudler/LocalAI/internal.Version=v2.19.3" -X "github.com/mudler/LocalAI/internal.Commit=86f8d5b50acd8fe88af4f537be0d42472772b928"" -tags "" -o local-ai ./
@chris-hatton looks something behind the scenes is happening - care to share dmesg when that happens?
Also, why using REBUILD? The image should have already binaries for the CPU flagset variants, including non-AVX - did you tried to disable it? your GPU seems also covered by defaults GPU_TARGETS
@mudler
Regarding...
The image should have already binaries for the CPU flagset variants, including non-AVX
...if I simply run the image localai/localai:latest-aio-gpu-hipblas with no rebuild, then the last lines in the log are:
===> LocalAI All-in-One (AIO) container starting...
AMD GPU detected
Non-NVIDIA GPU detected. Specific GPU memory size detection is not implemented.
===> Starting LocalAI[gpu-8g] with the following models: /aio/gpu-8g/embeddings.yaml,/aio/gpu-8g/rerank.yaml,/aio/gpu-8g/text-to-speech.yaml,/aio/gpu-8g/image-gen.yaml,/aio/gpu-8g/text-to-text.yaml,/aio/gpu-8g/speech-to-text.yaml,/aio/gpu-8g/vision.yaml
@@@@@
Skipping rebuild
@@@@@
If you are experiencing issues with the pre-compiled builds, try setting REBUILD=true
If you are still experiencing issues with the build, try setting CMAKE_ARGS and disable the instructions set as needed:
CMAKE_ARGS="-DGGML_F16C=OFF -DGGML_AVX512=OFF -DGGML_AVX2=OFF -DGGML_FMA=OFF"
see the documentation at: https://localai.io/basics/build/index.html
Note: See also https://github.com/go-skynet/LocalAI/issues/288
@@@@@
CPU info:
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm ida flush_l1d
CPU: no AVX found
CPU: no AVX2 found
CPU: no AVX512 found
@@@@@
Then the container halts with exit code 132 - meaning Illegal Opcode i.e. no AVX. So it looks like something in LocalAI's setup is requiring AVX.
It's that guidance in the log that prompted me to rebuild.
@chris-hatton gotcha, I did not payed attention to the messages that you are actually using hipblas images. Could you try https://github.com/mudler/LocalAI/pull/4167 if solves your issues once merged in master?
Unfortunately #4167 did not resolve the issue as noted here
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue was closed because it has been stalled for 5 days with no activity.