Forkoz comments

Results 351 comments of


Forkoz

Something seems wrong with performance on Nvidia/cuda

You all have correct GPU accelerated bits and bytes on windows, right? And proper cuda using torch/etc... Because it sounds like it isn't so.

Support Laions open assistant model

This model already works. It is just pythia-12b fine tuned.

server.py not starting with GPTQ latest git 534edc7

So this is why I couldn't load the models after I fixed the ) bug. But now we can quantize in different group size. Which one is the best for...

Improve memory management

6 is your entire GPU, leave some room for the browser and windows/xorg/etc.

Improve memory management

Can we use value like 3.5 for this? I only tried with solid numbers but it sticks when I put --gpu-memory 20 or 22. It used to go over before...

Improve memory management

Make sure to clear context and use the exact same prompts/settings. Preferably in some mode where you get the exact same response back. I.e. Disable do_sample. Otherwise it gets really...

Improve memory management

Try the mathmul kernels, they use less vram.

GPTQ-GPTJ

So now there is NEO-X and GPTJ and Regular GPTQ (opt,llama,bloom).. all separate repo with separate kernels. And I think the kernels may not work across versions and all require...

How to set a number of cpu thereds to use?

Transformers/Accelerate does this in CPU mode too? Ouch. edit: Hey.. so a shot in the dark, did you try with https://github.com/zphang/transformers.git@68d640f7c368bcaaaecfc678f11908ebbd3d6176 That transformer would use multiple cores for me for...

CUDA error when loading LLaMa model in Ubuntu WSL

You can't use hugging face without generating a login token. You have to d/l those files manually.