Paisley Mahern

Results 2 comments of Paisley Mahern

Doing the symlink trick seemed to just make my system use the CPU for CUDA and the worker won't start if I turn on quantization. Instead, I added `export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH`...

You can turn on quantization to reduce the VRAM needed (this will reduce the accuracy as well). I was testing it with 4bit quant but the 13b model might fit...