Add LightOnOCR model implementation
What does this PR do?
Implementation of the LightonOCR model following Modular Transformers architecture.
Our model is a 1B parameter OCR model using Pixtral as the vision encoder and Qwen3 as the LLM decoder.
I still have an issue with auto configuration, which I'm not familiar with:
🚨 Config not found for lightonocr. You can manually add it to HARDCODED_CONFIG_FOR_MODELS in utils/auto_docstring.py
Fixes # (issue)
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Did you read the contributor guideline, Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
- [ ] Did you write any new necessary tests?
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
Hi @molbap @zucchini-nlp ! Thanks for the detailed review and the instructions! Sorry I didn't get back to you last week, I was a little overloaded with the release.
i went through you comment and I'm still running into test issues. The problem is that modular is trying to reimplement a common attention mechanism for both the Pixtral vision model and the Qwen3 text model, and this is causing some conflicts. Have you dealt with this before? Not sure how to handle that :(
Hi I think all the tests passed and i rebased the branch :). Please tell me if I need to do something else Thanks a lot for your help !
run-slow: lightonocr
This comment contains run-slow, running the specified jobs:
models: ['models/lightonocr'] quantizations: [] ...
There's a couple errors on the slow tests I just ran: with
FAILED tests/models/lightonocr/test_modeling_lightonocr.py::LightOnOCRForConditionalGenerationIntegrationTest::test_model_can_generate_without_images - AttributeError: 'LightOnOCRConfig' object has no attribute 'vocab_size'
FAILED tests/models/lightonocr/test_modeling_lightonocr.py::LightOnOCRForConditionalGenerationIntegrationTest::test_model_forward_with_images - AttributeError: 'LightOnOCRConfig' object has no attribute 'vocab_size'
so just a config key missing/a wrong access IMO. The other one which is suspicious is
def vision_apply_rotary_pos_emb(q, k, cos, sin, position_ids=None, unsqueeze_dim=1):
"""Applies Rotary Position Embedding to the query and key tensors.
.......
"""
cos = cos.unsqueeze(unsqueeze_dim)
sin = sin.unsqueeze(unsqueeze_dim)
> q_embed = (q * cos) + (vision_rotate_half(q) * sin)
E RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
which is on a multi-GPU setting only but means that a q tensor should be created on a safer device, or at least within this util function to be safe.
There's a couple errors on the
slow testsI just ran: withFAILED tests/models/lightonocr/test_modeling_lightonocr.py::LightOnOCRForConditionalGenerationIntegrationTest::test_model_can_generate_without_images - AttributeError: 'LightOnOCRConfig' object has no attribute 'vocab_size' FAILED tests/models/lightonocr/test_modeling_lightonocr.py::LightOnOCRForConditionalGenerationIntegrationTest::test_model_forward_with_images - AttributeError: 'LightOnOCRConfig' object has no attribute 'vocab_size'so just a config key missing/a wrong access IMO. The other one which is suspicious is
def vision_apply_rotary_pos_emb(q, k, cos, sin, position_ids=None, unsqueeze_dim=1): """Applies Rotary Position Embedding to the query and key tensors. ....... """ cos = cos.unsqueeze(unsqueeze_dim) sin = sin.unsqueeze(unsqueeze_dim) > q_embed = (q * cos) + (vision_rotate_half(q) * sin) E RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!which is on a multi-GPU setting only but means that a q tensor should be created on a safer device, or at least within this util function to be safe.
i added vocab getter and put the q and k on the same device as cos and sin :)
Hi @molbap thanks again for your help ! Could you please re trigger the slow tests to be sure that everything pass before the intervention of the last reviewer :) ? Thanks
run-slow: lightonocr
This comment contains run-slow, running the specified jobs:
models: ["models/lightonocr"] quantizations: []
Hye @baptiste-aubertin , While we're waiting on core maintainer's review, I left a few comments and questions below, mostly about things that looks redundant to me. LMK what you think and if we can delete those?
Hi @zucchini-nlp thanks you ! This time I will try to answer rapidly 😅
run-slow: lightonocr
This comment contains run-slow, running the specified jobs:
models: ["models/lightonocr"] quantizations: []
The run-slow command is changing, do not worry about the CI result! Thanks for the fixup :) And really thanks for the fast iterations, makes a world of difference on model integrations.
hey hey, there's just this
Traceback (most recent call last):
File "/home/runner/work/transformers/transformers/.venv/bin/doc-builder", line 10, in <module>
sys.exit(main())
File "/home/runner/work/transformers/transformers/.venv/lib/python3.10/site-packages/doc_builder/commands/doc_builder_cli.py", line 50, in main
args.func(args)
File "/home/runner/work/transformers/transformers/.venv/lib/python3.10/site-packages/doc_builder/commands/build.py", line 103, in build_command
build_doc(
File "/home/runner/work/transformers/transformers/.venv/lib/python3.10/site-packages/doc_builder/build_doc.py", line 392, in build_doc
anchors_mapping, source_files_mapping = build_mdx_files(
File "/home/runner/work/transformers/transformers/.venv/lib/python3.10/site-packages/doc_builder/build_doc.py", line 255, in build_mdx_files
raise type(e)(f"There was an error when converting {file} to the MDX format.\n" + e.args[0]) from e
ValueError: There was an error when converting ../transformers/docs/source/en/model_doc/lightonocr.md to the MDX format.
Unable to find LightOnOCRTextConfig in transformers. Make sure the path to that object is correct.
It is likely caused by the missing config object in __all__? Other than that all looks good,cc @Cyrilvallez for core review as this is mergeable soon!
hey hey, there's just this
Traceback (most recent call last): File "/home/runner/work/transformers/transformers/.venv/bin/doc-builder", line 10, in <module> sys.exit(main()) File "/home/runner/work/transformers/transformers/.venv/lib/python3.10/site-packages/doc_builder/commands/doc_builder_cli.py", line 50, in main args.func(args) File "/home/runner/work/transformers/transformers/.venv/lib/python3.10/site-packages/doc_builder/commands/build.py", line 103, in build_command build_doc( File "/home/runner/work/transformers/transformers/.venv/lib/python3.10/site-packages/doc_builder/build_doc.py", line 392, in build_doc anchors_mapping, source_files_mapping = build_mdx_files( File "/home/runner/work/transformers/transformers/.venv/lib/python3.10/site-packages/doc_builder/build_doc.py", line 255, in build_mdx_files raise type(e)(f"There was an error when converting {file} to the MDX format.\n" + e.args[0]) from e ValueError: There was an error when converting ../transformers/docs/source/en/model_doc/lightonocr.md to the MDX format. Unable to find LightOnOCRTextConfig in transformers. Make sure the path to that object is correct.It is likely caused by the missing config object in
__all__? Other than that all looks good,cc @Cyrilvallez for core review as this is mergeable soon!
Hi i added missing configs to all and i rebased :) I hope it will solve the issue !
Hi I rebased and adapted the tied weight to the new format :)
I also removed _init_weights methods who triggered test fail and seemed to be useless
A test is not passing but it not seems to be related with our model. Could you please run the slow tests @molbap ?
run-slow: lightonocr
This comment contains run-slow, running the specified jobs:
models: ["models/lightonocr"] quantizations: []
CI Results
Model CI Report
❌ Failed tests
- lightonocr: tests/models/lightonocr/test_modeling_lightonocr.py::LightOnOCRForConditionalGenerationModelTest::test_eager_padding_matches_padding_free_with_position_ids tests/models/lightonocr/test_modeling_lightonocr.py::LightOnOCRForConditionalGenerationIntegrationTest::test_model_can_generate_without_images tests/models/lightonocr/test_modeling_lightonocr.py::LightOnOCRForConditionalGenerationIntegrationTest::test_model_forward_with_images
There's just these:
FAILED tests/models/lightonocr/test_modeling_lightonocr.py::LightOnOCRForConditionalGenerationModelTest::test_model_parallelism - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
FAILED tests/models/lightonocr/test_modeling_lightonocr.py::LightOnOCRForConditionalGenerationIntegrationTest::test_model_can_generate_without_images - AttributeError: 'LightOnOCRConfig' object has no attribute 'vocab_size'
FAILED tests/models/lightonocr/test_modeling_lightonocr.py::LightOnOCRForConditionalGenerationIntegrationTest::test_model_forward_with_images - AttributeError: 'LightOnOCRConfig' object has no attribute 'vocab_size'
For the last two - thought it was solved? but for the first one it's an "actual" issue, although you should be able to fix it by either creating the tensor in the same device as the first one, or just make sure you don't have any lingering .to(device) as device_map="auto" should be able to handle the device placement!
hi @molbap , i fixed the tests. Can you rerun the slow one, the device test needs to be run on several gpus :)
I re rabased and I still have tests not related to our model who are failing :/
hi I rabased this morning and all the tests are now passing 👍 😄
We have had very good results fine-tuning on LightOnOCR, and are looking forward to this being merged so we can use it in production.
cc @molbap @zucchini-nlp, feel free to tag me when you believe this is ready for final review!
Hi @Cyrilvallez @molbap , I re rebased today. All the tests seems to pass on my side, can you trigger the slow tests ? :)
run-slow: lightonocr