DanielWe2

Results 21 comments of DanielWe2

No, problem. What I don't understand: The GPTQ Cuda version works with 2048 context length (the benchmarks that output ppl). So does your version use a little bit more memory?