diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

[WIP] Rolling KV cache for autoregressive generation

Open zucchini-nlp opened this issue 1 month ago • 1 comments

What does this PR do?

Fixes https://github.com/huggingface/diffusers/issues/12600

Functionality-wise the self attention cache seems to work correctly, cross-attention has to be added and verfied. I added Krea to test the cache though I am not getting the same output as the original model yet. From quick debugging, looked to be related to timesteps or rope embeddings. Opening a draft as a reminder to myself to give this feature higher priority

zucchini-nlp avatar Dec 02 '25 10:12 zucchini-nlp

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.