Ryan Dick
Ryan Dick
> Did not consider coz .. well I didn't know it was implemented there. Easier to do it this way but one advantage of keeping the source might be to...
This actually sounds like the expected behaviour. We no longer support non-lazy model loading. So, with lazy offloading and the VRAM cache limit so low, only the last used model...
That proposal sounds reasonable and relatively straightforward from an implementation perspective. My main concern is that it might be hard for a user to find a `clear_cache_after` value that is...
Link to the diffusers issue for reference: https://github.com/huggingface/diffusers/issues/9171
I haven't looked at the code yet, but do you know if there are still use cases for using attention processors other than Torch 2.0 SDP? Based on the benchmarking...
I thought about this some more, and I'm hesitant to proceed with trying to merge this until we have more clarity around which attention implementations we actually want to support....
Not for this PR, but I did some performance testing and we'll probably want to address this at some point: SDXL: ```bash >>> Time taken to prepare attention processors: 0.10069823265075684s...
It looks like there was a significant re-write of the attention logic after the latest round of review and testing on this PR. @StAlKeR7779 can you shed some light on...
@gigend @KudintG @tensorflow73 Just to confirm, are you all seeing `Process exited with code: 3221225477`? Or just the same warnings that lead up to it? And, can you all confirm...
> I've found that the base Flux schnell model with the standard T5 works fine, but other flux models crash it out with the error pointing to the flash attention...