llama-cpp-python icon indicating copy to clipboard operation
llama-cpp-python copied to clipboard

Failed to run on Intel GPUs

Open rnwang04 opened this issue 11 months ago • 2 comments

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • [x] I carefully followed the README.md.
  • [ ] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [ ] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

I expect llama-cpp-python can normally run on Intel GPUs as llama.cpp do.

Current Behavior

llama-cpp-python fail to run on Intel GPUs while llama.cpp sycl backend can run normally.

Environment and Context

I test SYCL support on Intel Arc A770 GPU, with ubuntu 22.04 system, oneapi version is 2024.0 . I have verified llama.cpp sycl backend works normally on my machine.

Steps to Reproduce

conda create -n llm python=3.9
conda activate llm
source /opt/intel/oneapi/setvars.sh 
CMAKE_ARGS="-DLLAMA_SYCL=on -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx" pip install llama-cpp-python
python test.py

while test.py is

from llama_cpp import Llama
llm = Llama(
      model_path="~/llama.cpp/models/7B/ggml-model-q4_0-pure.gguf",
      n_gpu_layers=33, # Uncomment to use GPU acceleration
      seed=1337, # Uncomment to set a specific seed
      # n_ctx=2048, # Uncomment to increase the context window
)
output = llm(
      "Q: Name the planets in the solar system? A: ", # Prompt
      max_tokens=32, # Generate up to 32 tokens, set to None to generate up to the end of the context window
      stop=["Q:", "\n"], # Stop generating just before the model would generate a new question
      echo=True # Echo the prompt back in the output
) # Generate a completion, can also call create_completion
print(output)

Failure Logs

ggml_init_sycl: GGML_SYCL_DEBUG: 0
ggml_init_sycl: GGML_SYCL_F16: no
found 2 SYCL devices:
|ID| Name                                        |compute capability|Max compute units|Max work group|Max sub group|Global mem size|
|--|---------------------------------------------|------------------|-----------------|--------------|-------------|---------------|
| 0|         13th Gen Intel(R) Core(TM) i9-13900K|               3.0|               32|          8192|           64|    67181625344|
| 1|               Intel(R) FPGA Emulation Device|               1.2|               32|      67108864|           64|    67181625344|
DeviceList is empty. -30 (PI_ERROR_INVALID_VALUE)Exception caught at file:/tmp/pip-install-31terybs/llama-cpp-python_2e42ff812a094f19b998956fddc30615/vendor/llama.cpp/ggml-sycl.cpp, line:13341

rnwang04 avatar Mar 13 '24 06:03 rnwang04