Forkoz

Results 351 comments of Forkoz

You all have correct GPU accelerated bits and bytes on windows, right? And proper cuda using torch/etc... Because it sounds like it isn't so.

This model already works. It is just pythia-12b fine tuned.

So this is why I couldn't load the models after I fixed the ) bug. But now we can quantize in different group size. Which one is the best for...

6 is your entire GPU, leave some room for the browser and windows/xorg/etc.

Can we use value like 3.5 for this? I only tried with solid numbers but it sticks when I put --gpu-memory 20 or 22. It used to go over before...

Make sure to clear context and use the exact same prompts/settings. Preferably in some mode where you get the exact same response back. I.e. Disable do_sample. Otherwise it gets really...

Try the mathmul kernels, they use less vram.

So now there is NEO-X and GPTJ and Regular GPTQ (opt,llama,bloom).. all separate repo with separate kernels. And I think the kernels may not work across versions and all require...

Transformers/Accelerate does this in CPU mode too? Ouch. edit: Hey.. so a shot in the dark, did you try with https://github.com/zphang/transformers.git@68d640f7c368bcaaaecfc678f11908ebbd3d6176 That transformer would use multiple cores for me for...

You can't use hugging face without generating a login token. You have to d/l those files manually.