DanielWe2 comments

Repositories
Issues
Comments

Results 21 comments of


                                            DanielWe2

Needs more VRAM than normal GPTQ CUDA version?

No, problem. What I don't understand: The GPTQ Cuda version works with 2048 context length (the benchmarks that output ppl). So does your version use a little bit more memory?