llama-cpp-python CUDA not supported. `ValueError: Attempt to split tensors that exceed maximum supported devices. Current LLAMA_MAX

CUDA not supported. `ValueError: Attempt to split tensors that exceed maximum supported devices. Current LLAMA_MAX_DEVICES=1`

Open freckletonj opened this issue 1 year ago • 1 comments

trafficstars

This was a problem that I think was prematurely closed:

https://github.com/abetlen/llama-cpp-python/issues/1166

My current efforts are to get a llama 3.1 70B gguf running on 2 3090s, and no matter my installation method, I'm getting the same error. Moreover, it appears llama_cpp.llama_supports_gpu_offload() always reports False even though it can use a single GPU.

Error:

# it never even hears the ENV VAR (!?), still reports as 1 device
$ LLAMA_MAX_DEVICES=2 my_thing  
ValueError: Attempt to split tensors that exceed maximum supported devices. Current LLAMA_MAX_DEVICES=1

Installation Methods:

# 1
pip install llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122

# 2
CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python

# 4
pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir

# 5
pip install llama-cpp-python   --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122 --upgrade --force-reinstall --no-cache-dir

# 6 downgrade was reported in this Issue to work, but does not
pip install llama-cpp-python==0.2.77   --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122 --upgrade --force-reinstall

# 7
pip install llama-cpp-python==0.2.76   --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122 --upgrade --force-reinstall

# 8
CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python==0.2.77 --upgrade --force-reinstall --no-cache-dir --verbose

# 9 build fails
git checkout "v0.2.77"
CMAKE_ARGS="-DGGML_CUDA=on" pip install -e . --upgrade --force-reinstall --no-cache-dir --verbose

  CMake Error at CMakeLists.txt:25 (add_subdirectory):
    The source directory
      llama-cpp-python/vendor/llama.cpp
    does not contain a CMakeLists.txt file.

# 10 downgrade build fails the same
git clone ...
CMAKE_ARGS="-DGGML_CUDA=on" pip install -e ../lib/llama-cpp-python/ --verbose

# 11 maybe we copy that CMakeLists in? nope.
$ cp CMakeLists.txt vendor/llama.cpp/
$ CMAKE_ARGS="-DGGML_CUDA=on" pip install -e . --upgrade --force-reinstall --no-cache-dir --verbose

CMake Error at vendor/llama.cpp/CMakeLists.txt:25 (add_subdirectory):
    add_subdirectory given source "vendor/llama.cpp" which is not an existing
    directory.

After each attempt (besides builds, which all fail):

$ python -c "import llama_cpp; print(llama_cpp.llama_max_devices())"
1

$ python -c "import llama_cpp; print(llama_cpp.llama_supports_gpu_offload())"
False

$ python3 -c "import torch; print(torch.cuda.device_count())"
2

Originally posted by @freckletonj in https://github.com/abetlen/llama-cpp-python/issues/1166#issuecomment-2294990187

Aug 18 '24 19:08 freckletonj

I was running into the same, reported here https://github.com/abetlen/llama-cpp-python/issues/1693

Aug 19 '24 09:08 itinance

llama-cpp-python llama-cpp-python copied to clipboard

CUDA not supported. `ValueError: Attempt to split tensors that exceed maximum supported devices. Current LLAMA_MAX_DEVICES=1`

llama-cpp-python
llama-cpp-python copied to clipboard