stable-diffusion.cpp Much higher RAM usage (2-3 times) compared to FastSDCPU when using the exact same models/settings

Much higher RAM usage (2-3 times) compared to FastSDCPU when using the exact same models/settings

Open JohnAlcatraz opened this issue 1 year ago • 2 comments

Currently stable-diffusion.cpp seems to have a too high RAM usage compared to https://github.com/rupeshs/fastsdcpu (written in Python) for the same result.

I compared the Dreamshaper LCM model + TAESD at 5 steps and a resolution of 512x512 on stable-diffusion.cpp vs FastSDCPU, running on the CPU.

The speed is fully identical between both projects, I get ~4.4 s/it with both projects.

But stable-diffusion.cpp uses a peak of 2 GB RAM, or 1.6 GB with flash attention enabled, while FastSDCPU only uses a peak of 700 MB RAM. So stable-diffusion.cpp needs between 2-3x more RAM for the same result.

It looks like some significant optimizations would be possible in stable-diffusion.cpp that make it much more memory efficient.

May 12 '24 02:05 JohnAlcatraz

Currently, im2col is being used for convolutions, which consumes a very high amount of RAM during the VAE phase.

I have been working on a kernel that merges im2col and matrix multiplications to avoid materializing a lot of data in memory, although that entails a 40% performance reduction. So far, I am only doing this for CUDA; for CPU it will be more difficult and will likely have a negative impact on performance.

May 13 '24 13:05 FSSRepo

Currently, im2col is being used for convolutions, which consumes a very high amount of RAM during the VAE phase.

But I did my comparison with TAESD instead of the VAE, so I think that means the VAE isn't used at all? TAESD is super lightweight already.

May 13 '24 14:05 JohnAlcatraz

stable-diffusion.cpp stable-diffusion.cpp copied to clipboard

Much higher RAM usage (2-3 times) compared to FastSDCPU when using the exact same models/settings

stable-diffusion.cpp
stable-diffusion.cpp copied to clipboard