Paisley Mahern
Results
2
comments of
Paisley Mahern
Doing the symlink trick seemed to just make my system use the CPU for CUDA and the worker won't start if I turn on quantization. Instead, I added `export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH`...
You can turn on quantization to reduce the VRAM needed (this will reduce the accuracy as well). I was testing it with 4bit quant but the 13b model might fit...