Alex Redden comments

Results 82 comments of


                                            Alex Redden

TypeError: NoneType takes no arguments

Okay should be fixed now! From here: 153dd913d02f05023fdf3b6c24a16d737f3c1359 - let me know if there are any further issues @lvjin521

Certain lora not applied correctly.

Ah- This is probably the result of the fused qkv lora not being applied correctly. This is actually good, since I can use it to test whether my new implementation...

Initial Delay in Image Generation with Flux Schnell on H100

The slowdown is due to the torch.compile compilation, it should speed up after that, but the initial generation may take a while, and also may take a while for each...

Initial Delay in Image Generation with Flux Schnell on H100

Yeah- so essentially using nightly is significantly better.

Initial Delay in Image Generation with Flux Schnell on H100

I think it depends. Sometimes compilation will be more costly than others depending on torch version. I think at the time, nightly was 2.5.0 or 2.5.1, I'm not sure. So,...

when load certain lora, AttributeError: 'Flux' object has no attribute 'diffusion_model' happened.

Hmm, I actually have no idea why that error occurred. Interesting, I'll look into it.

The speed of drawing is not satisfactory

I am a bit confused. Where are you getting 3.32 iterations per second? Total generation time doesn't mean as much as the it/s speed. You also need to take into...

The speed of drawing is not satisfactory

The speeds you are getting look normal to me. The model does 3.32 forward passes per second which is relatively close to max tflops for a 4090 if you're generating...

reproductibility

Well float8 does effect precision, and there will be added error so you won't get the same image as you would if it was not quantized. If you're loading the...

Where is the code about "remaining layers use faster half precision accumulate"?

It's the CublasLinear layers. It's a repo I made which allows matmuls to run with half precision accumulate within the matmul kernel- which doubles the tflops for most consumer gpus....