llama.cpp ggml : become thread-safe

ggml : become thread-safe

Open ggerganov opened this issue 7 months ago • 7 comments

ref https://github.com/ggerganov/llama.cpp/discussions/499#discussioncomment-7478602

We should be able to run inference on multiple graphs, backends and devices in parallel. Currently, there are CUDA singletons that break this requirement and possibly there could be other problems.

Nov 05 '23 15:11 ggerganov

llama.cpp llama.cpp copied to clipboard

ggml : become thread-safe

llama.cpp
llama.cpp copied to clipboard