youwot

Results 1 comments of youwot

I am pretty sure exllama only works for gpu models. "A standalone Python/C++/CUDA implementation of Llama for use with 4-bit GPTQ weights, designed to be fast and memory-efficient on modern...