llama.cpp SYCL backend error PI_ERROR_INVALID_WORK_GROUP

SYCL backend error PI_ERROR_INVALID_WORK_GROUP_SIZE on iGPU UHD 770

Open fakezeta opened this issue 1 year ago • 2 comments

When offloading to iGPU UHD 770 in a docker from https://github.com/mudler/LocalAI (b2128) llama.cpp crashes with the following error:

The number of work-items in each dimension of a work-group cannot exceed {512, 512, 512} for this device -54 (PI_ERROR_INVALID_WORK_GROUP_SIZE)Exception caught at file:/build/backend/cpp/llama/llama.cpp/ggml-sycl.cpp, line:12708

From trial and error it happens if I have number of tokens predicted >256. I mean that if I limit the tokens with 256 it does not happen.

Tested with multiple 7b mistral models with both Q6 and Q8 quantization

Intel oneAPI version 2024.0

Feb 12 '24 18:02 fakezeta

Hi @fakezeta UHD770 might be limited nether on performance or functionality, as issued in the README

Feb 21 '24 11:02 airMeng

Thank you @airMeng, I read the README but it's only giving warning on performance not functionality. Tested with OpenVINO, the same oneAPI version and the same models I had not experienced any issues (and 6x performance but this is another story :) )

Feb 21 '24 15:02 fakezeta

@fakezeta I guess https://github.com/mudler/LocalAI use very old ggml for SYCL library. This issue has been fixed in llama.cpp.

Maybe you could try with llama.cpp on iGPU UHD 770, refer to sycl guide If it's OK, that approve my guess. You need to ask LocalAI to upgrade ggml for SYCL.

Mar 06 '24 01:03 NeoZhangJianyu

Sorry for the late reply. Tested with a more recent build and I can confirm that it's working fine.

Thank you.

Apr 01 '24 19:04 fakezeta

llama.cpp llama.cpp copied to clipboard

SYCL backend error PI_ERROR_INVALID_WORK_GROUP_SIZE on iGPU UHD 770

llama.cpp
llama.cpp copied to clipboard