llama.cpp
llama.cpp copied to clipboard
Fixed WSL cuda's OOM error
In WSL, the CUDA has limitation on the pinned memory. It's prohibited to allocate a large size of CUDA memory. But it can be bypassed in the program.
I just opened a large PR for multi GPU support: https://github.com/ggerganov/llama.cpp/pull/1607
This is good, this was the intended behavior when the host malloc fails, and not clearing the error was an oversight.