diffusers
diffusers copied to clipboard
[WIP] Rolling KV cache for autoregressive generation
What does this PR do?
Fixes https://github.com/huggingface/diffusers/issues/12600
Functionality-wise the self attention cache seems to work correctly, cross-attention has to be added and verfied. I added Krea to test the cache though I am not getting the same output as the original model yet. From quick debugging, looked to be related to timesteps or rope embeddings. Opening a draft as a reminder to myself to give this feature higher priority
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.