Alex Redden
Alex Redden
Okay should be fixed now! From here: 153dd913d02f05023fdf3b6c24a16d737f3c1359 - let me know if there are any further issues @lvjin521
Ah- This is probably the result of the fused qkv lora not being applied correctly. This is actually good, since I can use it to test whether my new implementation...
The slowdown is due to the torch.compile compilation, it should speed up after that, but the initial generation may take a while, and also may take a while for each...
Yeah- so essentially using nightly is significantly better.
I think it depends. Sometimes compilation will be more costly than others depending on torch version. I think at the time, nightly was 2.5.0 or 2.5.1, I'm not sure. So,...
Hmm, I actually have no idea why that error occurred. Interesting, I'll look into it.
I am a bit confused. Where are you getting 3.32 iterations per second? Total generation time doesn't mean as much as the it/s speed. You also need to take into...
The speeds you are getting look normal to me. The model does 3.32 forward passes per second which is relatively close to max tflops for a 4090 if you're generating...
Well float8 does effect precision, and there will be added error so you won't get the same image as you would if it was not quantized. If you're loading the...
It's the CublasLinear layers. It's a repo I made which allows matmuls to run with half precision accumulate within the matmul kernel- which doubles the tflops for most consumer gpus....