whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

Ability to have free all backend things separately

Open maximsml opened this issue 1 year ago • 0 comments
trafficstars

We face the next issue while creating each backend as separate loaded DLL. In your code "ggml-backend.c" there are two methods for backend buffer creation: ggml_backend_buft_alloc_buffer - this method call backend interface to allocate buffer and ggml_backend_buffer_init - this method allocate backend buffer with local "malloc" to free that backend buffer you have one function ggml_backend_buffer_free which is called backend interface to free the buffer if that method present on interface and after it calls local "free" function. In CUDA interface implementation if we put it into separate DLL the function ggml_backend_buft_alloc_buffer calls the ggml_backend_cuda_buffer_type_alloc_buffer interface method to prepare the buffer which in same time calls ggml_backend_buffer_init and in case of separate DLL implementation we have two ggml_backend_buffer_init functions for buffer allocation: one stay at whisper library and another linked to the backend CUDA DLL. So based on implementation we have that ggml_backend_buffer_t pointer allocated in backend DLL and freed in the application which cause an issue. Can you change code that buffer allocated from backend call also free on backend side? For example if "buffer->iface.free_buffer" specified then the "free" method is not called or add additional method on "buffer->iface" for disposing actual "buffer" object?

maximsml avatar Mar 25 '24 12:03 maximsml