Forkoz comments

Results 474 comments of


                                            Forkoz

Illegal memory access when using a lora

I just saw..I loaded it afterwards and got the cuda assert. Then I turn off fused attention and it loads but the generation feels really slow. ok.. nvm.. I think...

Illegal memory access when using a lora

It's choking down at 3k context. To 3-4it/s even. The merged copy shouldn't have this problem but its 128g with no act order and I wanted to try the original...

Illegal memory access when using a lora

Qlora has mildly better perplexity and that probably carries over to training. But as you say, the same modules can be targeted and trained faster. You still sort of need...

Performance issues

Uses AWQ. I wonder about perplexity and memory performance of that format vs GPTQ.

Performance issues

The paper probably doesn't compare optimized exllama at 64G. Remember the SPQR paper doing similar. Noticed a lot of authors do very favorable results in their graphs and creatively omit...

Codellama 16K context length?

ime airoboros doesn't use compress_pos_embed and I found the best perplexity was obtained using alpha 2.7. the default 100k base gives lower results when I ran it as a lora....

Slowdown again with pascal cards.

Your driver is probably fine. It's the venv. I too have cuda 12 on the system and then cuda 11.8 in the venv. I had to download all the cuda...

Slowdown again with pascal cards.

This is why I like conda. A fresh environment with new cu118 torch and reqs usually fixes things. Although I've yet to mess up a single conda env or venv,...

ERROR: Could not build wheels for aeneas on ArchLinux

You need to install `cannot find -lespeak` the lib for espeak.

Add LoRA fine-tuning to AWQ

ime, triton was never faster for anything. exclusionary high compute requirements and slower speed, oh my. The only one who has pulled off merging adapters into quantized models is GGUF....