Misc. bug: OpenCL context reference counting is wrong in llama-bench
Name and Version
version: 6024 (834c0ea2) built with cc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0 for x86_64-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-bench
Command line
./llama-bench -m /home/rob/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf -p 32,64 -ngl 33 -r 0
Problem description & steps to reproduce
The crux of the problem is that an OpenCL context is created by ggml_opencl_probe_devices, from ggml_backend_load_all called at llama-bench.cpp:1838.
However, the OpenCL context is released by ggml_cl2_free, from llama_free called at llama-bench.cpp:2189, which is inside the for-loop. This means that on the next iteration of the for-loop there is no valid OpenCL context and drivers will return errors.
I'm not sure if the problem is with where the OpenCL backend releases the context or whether llama-bench shouldn't be calling llama_free inside the loop, or of llama-bench should take another reference to that backend context inside the loop.
Any advice on the best way to fix the issue would be much appreciated.
First Bad Commit
No response
Relevant log output
@max-krasnyansky , @lhez as people who seem to know about the OpenCL backend, do you have an opinion on the right way to fix this?
@max-krasnyansky , @lhez as people who seem to know about the OpenCL backend, do you have an opinion on the right way to fix this?
Oh. Missed this one earlier. Will try to reproduce locally and report back.
@robquill Could you share the error message you got as well as your environment (which GPU, driver version)?
I did this on a PowerVR GPU, where the extra releases actually cause a problem. I tested just now on my Intel GPU and llama-bench runs correctly. However, I have been able to show the problem using the OpenCL Intercept Layer (https://github.com/intel/opencl-intercept-layer).
export CLI_LeakChecking=1
~/git/opencl-intercept-layer/build/cliloader/cliloader ./bin/llama-bench -m /home/rob/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf -p 32,64 -ngl 33 -r 0
gives:
Leak Checking:
Unexpected counts for type cl_context!
Number of Allocations: 1
Number of Retains: 0
Number of Releases: 3
...
I did this on a PowerVR GPU, where the extra releases actually cause a problem. I tested just now on my Intel GPU and llama-bench runs correctly. However, I have been able to show the problem using the OpenCL Intercept Layer (https://github.com/intel/opencl-intercept-layer).
export CLI_LeakChecking=1 ~/git/opencl-intercept-layer/build/cliloader/cliloader ./bin/llama-bench -m /home/rob/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf -p 32,64 -ngl 33 -r 0gives:
Leak Checking: Unexpected counts for type cl_context! Number of Allocations: 1 Number of Retains: 0 Number of Releases: 3 ...
Thank you for the details! I have setup for OpenCL Intercept Layer. I will reproduce on my end and investigate.