Integrate xlstm cleanly.
What does this PR do?
This PR integrates xLSTM via the xlstm-library including certain optimizations (potentially use torch.compile and cuda graphs for speed up). This enables using the NX-AI/xLSTM-7b without a special fork of transformers.
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [x] Did you read the contributor guideline, Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
- [x] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
- [x] Did you write any new necessary tests? Yes, I adapted the tests of the recurrent Mamba2 model.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
@ArthurZucker
How can this PR keep failing inside other models' tests, how can they get into 'main'?
So, finally we got all tests passing, and have integrated a standalone version in the code now, so the external libraries would just be needed for further speed ups. Thanks for the further review!
comment on the text-generation-level changes: the
xLSTMCacheclass is missing proper documentation (seeMambaCachefor an example) and should be added to init.py
I think the documentation should now be similar, thanks for pointing this out!
@Rocketknight1 @Cyrilvallez , sorry for the long-time stall of this, I think it should be ready now for a review, thanks! :)
cc @Cyrilvallez as I'll be off for 2 weeks!
Hey! Sorry for the delay! Here I must say I mostly do not understand the Cache that was added. Could you explain a bit more why it's needed? Because from what I see, it simply return the current states, so I don't understand the usage? Also, let's simplify all that is config-related 🤗
There was a delay from our side as well beforehand, so all good! The cache object stores the three rnn states yes. It is needed as a replacement for a KV cache as it can greatly accelerate inference. It was introduced as a separate class similar to the MambaCache. Feel free to propose a different way to integrate this. All the config parts should be sufficiently flattened now.
@Rocketknight1 , @ArthurZucker , @gante, @Cyrilvallez , @stevhliu Any further comments? I think most (all?) of you concerns have been adressed now. Would be great to finalize this soon.
Hi @kpoeppel, sorry for the delay! I think this is just about ready to go, but I'll need to grab a core maintainer for a final review + merge, and schedules have been difficult between other sprints + holidays. Hopefully I can get one in the next week or two!
cc @cyrilvallez for final review maybe, since you're back now, but I know I've dropped a lot of big review requests on you. Feel free to pass it on to someone else!
Hey! Super super sorry for the delay! Here is a new round of reviews! Let me know is something is unclear 🤗 Still mostly concerned about the Cache (is it needed??), unnecesary abstractions, single letter variables, and asserts 🤗
Hey! Sorry also for the delay from my side. I integrated all your comments, the xLSTMCache is still necessary I think (like a MambaCache or KVCache). As it has a different structure than these, we need a separate class.
Hey! Sorry for the delay! Here is a new review 🤗 Let me know if something is still unclear!
Thanks for the next review!
I moved the xLSTMCache to modeling_xlstm.py and resolved all other issues. However now the auto_docstring decorator fails to work, as xLSTMCache probably is not global anymore. Should I switch back to the non-autodocstring docstring or is there a better way to fix this?
I don't think it has anything to do with the class being public or not! But you can find everything you need about auto_docstring here! Basically, you need to add docstring only for "unknown" args, e.g. here cache_params is unknown in the library
There was a leftover xLSTMCache mention in the generation docs files. So all your comments should be integrated now. :)
[For maintainers] Suggested jobs to run (before merge)
run-slow: auto, xlstm
Closing this as I just merged https://github.com/huggingface/transformers/pull/39665! Thanks a lot for the contribution, and for bearing with us during the review process! 🤗 Nice work!
@Cyrilvallez thanks for all the feedback. Nice that it finally made it in! :)