LocalAI icon indicating copy to clipboard operation
LocalAI copied to clipboard

Can't start LocalAI (with REBUILD) on Xeon X5570 - Unwanted AVX dependency?

Open chris-hatton opened this issue 1 year ago • 4 comments

LocalAI version: Using Docker image: localai/localai:latest-aio-gpu-hipblas

Environment, CPU architecture, OS, and Version:

  • Ubuntu 22.04
  • Xeon X5570 Specs (No AVX at all)
  • GPU Radeon 6700XT (gfx1031 but I've seen this to be gfx1030 compatible on a previous ROCm test)
  • installed ROCm driver per AMD instructions (version 6.2.60200)

Describe the bug

version: '3.9'
services:
  api:
    image: localai/localai:latest-aio-gpu-hipblas
    restart: no
    ports:
      - 8082:8080
    volumes:
      - /raid/AI/Models:/build/models:cached
    environment:
      - "DEBUG=true"
      - "REBUILD=true"
      - "CMAKE_ARGS=-DGGML_F16C=OFF -DGGML_AVX512=OFF -DGGML_AVX2=OFF -DGGML_AVX=OFF -DGGML_FMA=OFF -DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_AVX=OFF -DLLAMA_FMA=OFF"
      - "BUILD_TYPE=hipblas"
      - "GPU_TARGETS=gfx1030"
      - "BUILD_PARALLELISM=16"
    - devices:
      - /dev/dri
      - /dev/kfd
networks:
  apollonet:
    external: true
    name: apollonet

To Reproduce

  • Try to start the Docker container with the above configuration.
  • The Docker container goes through a lengthy build process for several hours
  • It looks like the rebuild completes(?) or gets a significant way through, then the Container quite with error code 132
  • Exit 132 is SIGILL (132-128=4) which means illegal instruction. I suspect AVX is still being used by something. My CPU doesn't support AVX at all, but I am expecting to run all inference on the GPU so achieving acceleration on CPU should be unimportant(?)

Expected behavior LocalAI starts and serves an interface on the exposed host port 8082, ready to run inference on the AMD card.

Logs The last lines of the startup/rebuild log, before the exit, are:

cp llama.cpp/build/bin/grpc-server .
make[2]: Leaving directory '/build/backend/cpp/llama-fallback'
make[1]: Entering directory '/build'
/usr/bin/upx backend/cpp/llama-fallback/grpc-server
                       Ultimate Packer for eXecutables
                          Copyright (C) 1996 - 2020
UPX 3.96        Markus Oberhumer, Laszlo Molnar & John Reiser   Jan 23rd 2020
        File size         Ratio      Format      Name
   --------------------   ------   -----------   -----------
  85962576 ->  17753404   20.65%   linux/amd64   grpc-server
Packed 1 file.
make[1]: Leaving directory '/build'
cp -rfv backend/cpp/llama-fallback/grpc-server backend-assets/grpc/llama-cpp-fallback
'backend/cpp/llama-fallback/grpc-server' -> 'backend-assets/grpc/llama-cpp-fallback'
I local-ai build info:
I BUILD_TYPE: hipblas
I GO_TAGS: 
I LD_FLAGS: -s -w -X "github.com/mudler/LocalAI/internal.Version=v2.19.3" -X "github.com/mudler/LocalAI/internal.Commit=86f8d5b50acd8fe88af4f537be0d42472772b928"
I UPX: /usr/bin/upx
CGO_LDFLAGS="-O3 --rtlib=compiler-rt -unwindlib=libgcc -lhipblas -lrocblas --hip-link -L/opt/rocm/lib/llvm/lib" go build -ldflags "-s -w -X "github.com/mudler/LocalAI/internal.Version=v2.19.3" -X "github.com/mudler/LocalAI/internal.Commit=86f8d5b50acd8fe88af4f537be0d42472772b928"" -tags "" -o local-ai ./

chris-hatton avatar Aug 24 '24 02:08 chris-hatton

@chris-hatton looks something behind the scenes is happening - care to share dmesg when that happens?

Also, why using REBUILD? The image should have already binaries for the CPU flagset variants, including non-AVX - did you tried to disable it? your GPU seems also covered by defaults GPU_TARGETS

mudler avatar Aug 24 '24 07:08 mudler

@mudler

Regarding...

The image should have already binaries for the CPU flagset variants, including non-AVX

...if I simply run the image localai/localai:latest-aio-gpu-hipblas with no rebuild, then the last lines in the log are:

===> LocalAI All-in-One (AIO) container starting...
AMD GPU detected
Non-NVIDIA GPU detected. Specific GPU memory size detection is not implemented.
===> Starting LocalAI[gpu-8g] with the following models: /aio/gpu-8g/embeddings.yaml,/aio/gpu-8g/rerank.yaml,/aio/gpu-8g/text-to-speech.yaml,/aio/gpu-8g/image-gen.yaml,/aio/gpu-8g/text-to-text.yaml,/aio/gpu-8g/speech-to-text.yaml,/aio/gpu-8g/vision.yaml
@@@@@
Skipping rebuild
@@@@@
If you are experiencing issues with the pre-compiled builds, try setting REBUILD=true
If you are still experiencing issues with the build, try setting CMAKE_ARGS and disable the instructions set as needed:
CMAKE_ARGS="-DGGML_F16C=OFF -DGGML_AVX512=OFF -DGGML_AVX2=OFF -DGGML_FMA=OFF"
see the documentation at: https://localai.io/basics/build/index.html
Note: See also https://github.com/go-skynet/LocalAI/issues/288
@@@@@
CPU info:
model name	: Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm ida flush_l1d
CPU: no AVX    found
CPU: no AVX2   found
CPU: no AVX512 found
@@@@@

Then the container halts with exit code 132 - meaning Illegal Opcode i.e. no AVX. So it looks like something in LocalAI's setup is requiring AVX.

It's that guidance in the log that prompted me to rebuild.

chris-hatton avatar Nov 16 '24 04:11 chris-hatton

@chris-hatton gotcha, I did not payed attention to the messages that you are actually using hipblas images. Could you try https://github.com/mudler/LocalAI/pull/4167 if solves your issues once merged in master?

mudler avatar Nov 16 '24 13:11 mudler

Unfortunately #4167 did not resolve the issue as noted here

chris-hatton avatar Nov 18 '24 09:11 chris-hatton

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Jul 21 '25 02:07 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

github-actions[bot] avatar Jul 27 '25 02:07 github-actions[bot]