Forkoz

Results 411 comments of Forkoz

>int8 Bits and bytes perf for P40 is not good. About 1/2 speed as well.

Just tested with nohalf2, if I did it right, it definitely went up on P6000. This is the 7b though. `Output generated in 30.07 seconds (3.33 tokens/s, 100 tokens, context...

Pascal is compute 6.1. Not sure how maxwell fares on this repo, I don't think anyone tried it yet. Pascal doesn't have an atomicadd half tho, unless you make the...

[half2-HPEC2017.pdf](https://github.com/turboderp/exllama/files/11905033/half2-HPEC2017.pdf) Supposedly there is a way to pack 2 half2 ops into a single FP32 operation and gain a speedup but I'm not sure if that is accomplished only for...

I thought about int8 as well but int8 is missing hardware matrix matmul.

Sounds like you got further than me. I am pretty rusty on the math here. This computes the dot product but what about the matrix product. I thought they were...

What model are you running because it says: ``` raise RuntimeError("Insufficient VRAM for model and cache") RuntimeError: Insufficient VRAM for model and cache ```

Might be stuck splitting manually?

I just moved from an AMD card and had stable diffusion running on it. To get it working I had to install the AMD driver and ROCM. Then I installed...

Yes.. Just use the virtual environment that works for stable diffusion. It uses mostly the same things. There is some stuff you have to do like set the environment variable...