Steward Garcia

Results 92 comments of Steward Garcia

@b-albar Could this work with infinite negatives (custom attention mask), It seems like I have to reshape the array to (batch_size, num_heads, seq_len, seq_len). It would be good if broadcasting...

Currently, im2col is being used for convolutions, which consumes a very high amount of RAM during the VAE phase. I have been working on a kernel that merges im2col and...

The truth is that, yes, the CPU backend isn't as optimized as it could be; perhaps it's the im2col kernel since it overuses memory accesses. In all ML software, the...

@leejet I believe that is done by adding noise only to the white part of the latent image, and in the decoder, keeping the pixels of the black part unchanged....

@leejet I think we should first solve that problem before considering adding the inpainting feature. Inpainting models require a latent image with 9 input channels, 4 for the usual channels,...

@mzwing ~~I'll try to implement the missing scheduler, but I'm not exactly sure which of the models you've uploaded to Hugging Face I should try to see if I get...

@bssrdf > Is im2col going to be skipped? Or done on the fly? I am going to do something similar to flash attention. I am going to divide the blocks...

@leejet I'm not sure if this could lead to memory leaks since it needs to be created for each model, and it's a lot 10MB to store just the metadata...

For now, the kernel I created to avoid the overhead of im2col results in a 50% reduction in performance, even though it's only applied to the operation that generates a...

@Green-Sky I'll try to do tests on RTX 3060, mainly with CUDA Toolkit 11.8. The truth is that there isn't a standard API for stable diffusion. For example, the ComfyUI...