Daniel Bolya comments

Results 59 comments of


                                            Daniel Bolya

support for SDXL

I haven't looked into it. How does SDXL differ from normal SD? If it's similar, there's probably a way to get it to work.

support for SDXL

Does it speed it up? I think the default behavior of the diffusers implementation is to do nothing when wrapping the wrong thing, so it might not actually be doing...

No reduction in graphics memory

All the benchmarks in the paper were done using the original stable diffusion repo, not diffusers (and diffusers may give different results). Also, it's hard to get an accurate memory...

Tomesd is not reducing the total memory consumption.

I believe someone looked into this before and found that diffusers was already using memory efficient attention, which saves most of the memory that ToMe would have saved anyway. The...

Failed to run on M1Mac with automatic1111 web ui

Hmm if you change all mentions of int64 to int32 in merge.py and reinstall, does it work?

Failed to run on M1Mac with automatic1111 web ui

Maybe you need to put `export PYTORCH_ENABLE_MPS_FALLBACK=1` in the webui launch script (see #15).

Failed to run on M1Mac with automatic1111 web ui

@Awethon are you on the latest dev build? (you have to install from source) That error was fixed already, but I haven't pushed it to pip yet.

Failed to run on M1Mac with automatic1111 web ui

Does ToMe work for you outside of the webui (for instance, in diffusers)? The error you got originally seems to me like MPS doesn't support negating an int, which is...

Out of Memory when used with Tiled VAE

It's unfortunate, but the current implementation of ToMe uses more memory when also using xformers/flash attn/torch 2.0 sdp attn or whatever. Without those implementation, ToMe reduces memory usage by reducing...

Could you provide codes for calculating FID, Time (s/im), and Memory (GB/im)

For FID, I used [pytorch-fid](https://github.com/mseitzer/pytorch-fid) (see the details in the paper for what sets I compared). For time taken, I simply timed how long a full 2000 image generation run...