Forkoz
Forkoz
>int8 Bits and bytes perf for P40 is not good. About 1/2 speed as well.
Just tested with nohalf2, if I did it right, it definitely went up on P6000. This is the 7b though. `Output generated in 30.07 seconds (3.33 tokens/s, 100 tokens, context...
Pascal is compute 6.1. Not sure how maxwell fares on this repo, I don't think anyone tried it yet. Pascal doesn't have an atomicadd half tho, unless you make the...
[half2-HPEC2017.pdf](https://github.com/turboderp/exllama/files/11905033/half2-HPEC2017.pdf) Supposedly there is a way to pack 2 half2 ops into a single FP32 operation and gain a speedup but I'm not sure if that is accomplished only for...
I thought about int8 as well but int8 is missing hardware matrix matmul.
Sounds like you got further than me. I am pretty rusty on the math here. This computes the dot product but what about the matrix product. I thought they were...
What model are you running because it says: ``` raise RuntimeError("Insufficient VRAM for model and cache") RuntimeError: Insufficient VRAM for model and cache ```
Might be stuck splitting manually?
I just moved from an AMD card and had stable diffusion running on it. To get it working I had to install the AMD driver and ROCM. Then I installed...
Yes.. Just use the virtual environment that works for stable diffusion. It uses mostly the same things. There is some stuff you have to do like set the environment variable...