llama.cpp
llama.cpp copied to clipboard
SYCL backend error PI_ERROR_INVALID_WORK_GROUP_SIZE on iGPU UHD 770
When offloading to iGPU UHD 770 in a docker from https://github.com/mudler/LocalAI (b2128) llama.cpp crashes with the following error:
The number of work-items in each dimension of a work-group cannot exceed {512, 512, 512} for this device -54 (PI_ERROR_INVALID_WORK_GROUP_SIZE)Exception caught at file:/build/backend/cpp/llama/llama.cpp/ggml-sycl.cpp, line:12708
From trial and error it happens if I have number of tokens predicted >256. I mean that if I limit the tokens with 256 it does not happen.
Tested with multiple 7b mistral models with both Q6 and Q8 quantization
Intel oneAPI version 2024.0
Hi @fakezeta UHD770 might be limited nether on performance or functionality, as issued in the README
Thank you @airMeng, I read the README but it's only giving warning on performance not functionality. Tested with OpenVINO, the same oneAPI version and the same models I had not experienced any issues (and 6x performance but this is another story :) )
@fakezeta I guess https://github.com/mudler/LocalAI use very old ggml for SYCL library. This issue has been fixed in llama.cpp.
Maybe you could try with llama.cpp on iGPU UHD 770, refer to sycl guide If it's OK, that approve my guess. You need to ask LocalAI to upgrade ggml for SYCL.
Sorry for the late reply. Tested with a more recent build and I can confirm that it's working fine.
Thank you.