What does this PR do?

This PR integrates xLSTM via the xlstm-library including certain optimizations (potentially use torch.compile and cuda graphs for speed up). This enables using the NX-AI/xLSTM-7b without a special fork of transformers.

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[x] Did you read the contributor guideline, Pull Request section?
[ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
[x] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
[x] Did you write any new necessary tests? Yes, I adapted the tests of the recurrent Mamba2 model.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

@ArthurZucker

Dec 20 '24 22:12 kpoeppel

How can this PR keep failing inside other models' tests, how can they get into 'main'?

Jan 10 '25 13:01 kpoeppel

So, finally we got all tests passing, and have integrated a standalone version in the code now, so the external libraries would just be needed for further speed ups. Thanks for the further review!

Apr 05 '25 06:04 kpoeppel

comment on the text-generation-level changes: the xLSTMCache class is missing proper documentation (see MambaCache for an example) and should be added to init.py

I think the documentation should now be similar, thanks for pointing this out!

Apr 05 '25 06:04 kpoeppel

@Rocketknight1 @Cyrilvallez , sorry for the long-time stall of this, I think it should be ready now for a review, thanks! :)

Apr 08 '25 13:04 kpoeppel

cc @Cyrilvallez as I'll be off for 2 weeks!

Apr 11 '25 15:04 ArthurZucker

Hey! Sorry for the delay! Here I must say I mostly do not understand the Cache that was added. Could you explain a bit more why it's needed? Because from what I see, it simply return the current states, so I don't understand the usage? Also, let's simplify all that is config-related 🤗

There was a delay from our side as well beforehand, so all good! The cache object stores the three rnn states yes. It is needed as a replacement for a KV cache as it can greatly accelerate inference. It was introduced as a separate class similar to the MambaCache. Feel free to propose a different way to integrate this. All the config parts should be sufficiently flattened now.

Apr 29 '25 18:04 kpoeppel

@Rocketknight1 , @ArthurZucker , @gante, @Cyrilvallez , @stevhliu Any further comments? I think most (all?) of you concerns have been adressed now. Would be great to finalize this soon.

May 19 '25 08:05 kpoeppel

Hi @kpoeppel, sorry for the delay! I think this is just about ready to go, but I'll need to grab a core maintainer for a final review + merge, and schedules have been difficult between other sprints + holidays. Hopefully I can get one in the next week or two!

Jun 09 '25 15:06 Rocketknight1

cc @cyrilvallez for final review maybe, since you're back now, but I know I've dropped a lot of big review requests on you. Feel free to pass it on to someone else!

Jun 11 '25 12:06 Rocketknight1

Hey! Super super sorry for the delay! Here is a new round of reviews! Let me know is something is unclear 🤗 Still mostly concerned about the Cache (is it needed??), unnecesary abstractions, single letter variables, and asserts 🤗

Hey! Sorry also for the delay from my side. I integrated all your comments, the xLSTMCache is still necessary I think (like a MambaCache or KVCache). As it has a different structure than these, we need a separate class.

Jul 02 '25 12:07 kpoeppel

Hey! Sorry for the delay! Here is a new review 🤗 Let me know if something is still unclear!

Thanks for the next review!

I moved the xLSTMCache to modeling_xlstm.py and resolved all other issues. However now the auto_docstring decorator fails to work, as xLSTMCache probably is not global anymore. Should I switch back to the non-autodocstring docstring or is there a better way to fix this?

Jul 08 '25 22:07 kpoeppel

I don't think it has anything to do with the class being public or not! But you can find everything you need about auto_docstring here! Basically, you need to add docstring only for "unknown" args, e.g. here cache_params is unknown in the library

Jul 09 '25 08:07 Cyrilvallez

There was a leftover xLSTMCache mention in the generation docs files. So all your comments should be integrated now. :)

Jul 09 '25 21:07 kpoeppel

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, xlstm

Jul 25 '25 07:07 github-actions[bot]

Closing this as I just merged https://github.com/huggingface/transformers/pull/39665! Thanks a lot for the contribution, and for bearing with us during the review process! 🤗 Nice work!

Jul 25 '25 17:07 Cyrilvallez

@Cyrilvallez thanks for all the feedback. Nice that it finally made it in! :)

Jul 26 '25 01:07 kpoeppel

Integrate xlstm cleanly.

What does this PR do?

Before submitting

Who can review?