add internvl_flash model
Resolves #41862
Hi @zucchini-nlp and @Rocketknight1,
Following your guidance in the issue, this PR re-implements the InternVL-Flash model as a completely separate model (instead of using an if flag in the existing InternVL class).
Implementation Details Created a new, independent model directory: src/transformers/models/internvl_flash/.
Used the transformers add-new-model-like script to scaffold the new model, as you suggested.
Implemented the model logic in modular_internvl_flash.py (including Gating, CrossAttentionPooling, etc.) and converted it using the modular script.
Testing All local tests are passing:
make fixup (style, quality, and repository consistency checks all pass)
pytest tests/models/internvl_flash/test_modeling_internvl_flash.py
Thank you for the guidance!
Before submitting [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[x] Did you read the contributor guideline, Pull Request section?
[x] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case. (Link: #41862)
[x] Did you make sure to update the documentation with your changes? (Added docs/source/en/model_doc/internvl_flash.md and updated _toctree.yml)
[x] Did you write any new necessary tests? (Added tests/models/internvl_flash/test_modeling_internvl_flash.py)
Who can review? @zucchini-nlp @Rocketknight1
Taking a look tomorrow-Monday, thanks for making a new model class
@zucchini-nlp Thank you for the advice. This is my first time submitting a PR, and Gonna working to resolve the test failures related to batch_size > 1 support. My initial intention in adding non-Flash methods was specifically to bypass these failing tests temporarily. I will continue working to implement a full solution.
[For maintainers] Suggested jobs to run (before merge)
run-slow: auto, internvl_flash
@zucchini-nlp I've finished the requested modifications. Please let me know if there are any other points to discuss before we merge.
Thanks, I will review some time this week. It was a bit hectic due to v5 last week
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
Thank you for your review! Recently working on my final. Hoping to slove this problems ASAP.
No problem, take your time :)
Thank you for your review! Recently working on my final. Hoping to slove this problems ASAP.
hello, is there any progress? I am the author of the issue https://github.com/huggingface/transformers/issues/41862. I have free time currently, may I do something for help the PR?
I am tring to convert raw internvl-3.5-flash model to huggingface format by modifing https://github.com/OpenGVLab/InternVL/blob/main/internvl_chat/tools/internvl_custom2hf.py . I find that there are 2 additional params exist in InternvlFlashConfig but not in InternVLConfig. Is it possible to add these two parameters in InternvlFlashConfig (although even without these two parameters, they can still be manually added in config.json to achieve the conversion)?
@YanxingLiu Sorry for not checking my mail.I use scripts for internvl to convert the models with some minor changes. Models can be found at( https://huggingface.co/chenhaoguan/InternVL3_5-2B-Flash-hf ). I have mail u my version. I haven't made any changes. Free to ask me if u have any question.