llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

ggml : become thread-safe

Open ggerganov opened this issue 7 months ago • 7 comments

ref https://github.com/ggerganov/llama.cpp/discussions/499#discussioncomment-7478602

We should be able to run inference on multiple graphs, backends and devices in parallel. Currently, there are CUDA singletons that break this requirement and possibly there could be other problems.

ggerganov avatar Nov 05 '23 15:11 ggerganov