transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Add LightOnOCR model implementation

Open baptiste-aubertin opened this issue 2 months ago • 39 comments

What does this PR do?

Implementation of the LightonOCR model following Modular Transformers architecture.

Our model is a 1B parameter OCR model using Pixtral as the vision encoder and Qwen3 as the LLM decoder.

I still have an issue with auto configuration, which I'm not familiar with:

🚨 Config not found for lightonocr. You can manually add it to HARDCODED_CONFIG_FOR_MODELS in utils/auto_docstring.py

Fixes # (issue)

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [ ] Did you read the contributor guideline, Pull Request section?
  • [ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • [ ] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

baptiste-aubertin avatar Oct 15 '25 14:10 baptiste-aubertin

Hi @molbap @zucchini-nlp ! Thanks for the detailed review and the instructions! Sorry I didn't get back to you last week, I was a little overloaded with the release.

i went through you comment and I'm still running into test issues. The problem is that modular is trying to reimplement a common attention mechanism for both the Pixtral vision model and the Qwen3 text model, and this is causing some conflicts. Have you dealt with this before? Not sure how to handle that :(

baptiste-aubertin avatar Oct 27 '25 22:10 baptiste-aubertin

Hi I think all the tests passed and i rebased the branch :). Please tell me if I need to do something else Thanks a lot for your help !

baptiste-aubertin avatar Oct 31 '25 13:10 baptiste-aubertin

run-slow: lightonocr

molbap avatar Oct 31 '25 13:10 molbap

This comment contains run-slow, running the specified jobs:

models: ['models/lightonocr'] quantizations: [] ...

github-actions[bot] avatar Oct 31 '25 14:10 github-actions[bot]

There's a couple errors on the slow tests I just ran: with

FAILED tests/models/lightonocr/test_modeling_lightonocr.py::LightOnOCRForConditionalGenerationIntegrationTest::test_model_can_generate_without_images - AttributeError: 'LightOnOCRConfig' object has no attribute 'vocab_size'
FAILED tests/models/lightonocr/test_modeling_lightonocr.py::LightOnOCRForConditionalGenerationIntegrationTest::test_model_forward_with_images - AttributeError: 'LightOnOCRConfig' object has no attribute 'vocab_size'

so just a config key missing/a wrong access IMO. The other one which is suspicious is

    def vision_apply_rotary_pos_emb(q, k, cos, sin, position_ids=None, unsqueeze_dim=1):
        """Applies Rotary Position Embedding to the query and key tensors.
        .......
        """
        cos = cos.unsqueeze(unsqueeze_dim)
        sin = sin.unsqueeze(unsqueeze_dim)
>       q_embed = (q * cos) + (vision_rotate_half(q) * sin)
E       RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

which is on a multi-GPU setting only but means that a q tensor should be created on a safer device, or at least within this util function to be safe.

molbap avatar Oct 31 '25 14:10 molbap

There's a couple errors on the slow tests I just ran: with

FAILED tests/models/lightonocr/test_modeling_lightonocr.py::LightOnOCRForConditionalGenerationIntegrationTest::test_model_can_generate_without_images - AttributeError: 'LightOnOCRConfig' object has no attribute 'vocab_size'
FAILED tests/models/lightonocr/test_modeling_lightonocr.py::LightOnOCRForConditionalGenerationIntegrationTest::test_model_forward_with_images - AttributeError: 'LightOnOCRConfig' object has no attribute 'vocab_size'

so just a config key missing/a wrong access IMO. The other one which is suspicious is

    def vision_apply_rotary_pos_emb(q, k, cos, sin, position_ids=None, unsqueeze_dim=1):
        """Applies Rotary Position Embedding to the query and key tensors.
        .......
        """
        cos = cos.unsqueeze(unsqueeze_dim)
        sin = sin.unsqueeze(unsqueeze_dim)
>       q_embed = (q * cos) + (vision_rotate_half(q) * sin)
E       RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

which is on a multi-GPU setting only but means that a q tensor should be created on a safer device, or at least within this util function to be safe.

i added vocab getter and put the q and k on the same device as cos and sin :)

baptiste-aubertin avatar Oct 31 '25 18:10 baptiste-aubertin

Hi @molbap thanks again for your help ! Could you please re trigger the slow tests to be sure that everything pass before the intervention of the last reviewer :) ? Thanks

baptiste-aubertin avatar Nov 05 '25 10:11 baptiste-aubertin

run-slow: lightonocr

zucchini-nlp avatar Nov 05 '25 11:11 zucchini-nlp

This comment contains run-slow, running the specified jobs:

models: ["models/lightonocr"] quantizations: []

github-actions[bot] avatar Nov 05 '25 11:11 github-actions[bot]

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

github-actions[bot] avatar Nov 05 '25 12:11 github-actions[bot]

Hye @baptiste-aubertin , While we're waiting on core maintainer's review, I left a few comments and questions below, mostly about things that looks redundant to me. LMK what you think and if we can delete those?

Hi @zucchini-nlp thanks you ! This time I will try to answer rapidly 😅

baptiste-aubertin avatar Nov 05 '25 12:11 baptiste-aubertin

run-slow: lightonocr

molbap avatar Nov 06 '25 09:11 molbap

This comment contains run-slow, running the specified jobs:

models: ["models/lightonocr"] quantizations: []

github-actions[bot] avatar Nov 06 '25 09:11 github-actions[bot]

The run-slow command is changing, do not worry about the CI result! Thanks for the fixup :) And really thanks for the fast iterations, makes a world of difference on model integrations.

molbap avatar Nov 06 '25 16:11 molbap

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

github-actions[bot] avatar Nov 06 '25 17:11 github-actions[bot]

hey hey, there's just this


Traceback (most recent call last):
  File "/home/runner/work/transformers/transformers/.venv/bin/doc-builder", line 10, in <module>
    sys.exit(main())
  File "/home/runner/work/transformers/transformers/.venv/lib/python3.10/site-packages/doc_builder/commands/doc_builder_cli.py", line 50, in main
    args.func(args)
  File "/home/runner/work/transformers/transformers/.venv/lib/python3.10/site-packages/doc_builder/commands/build.py", line 103, in build_command
    build_doc(
  File "/home/runner/work/transformers/transformers/.venv/lib/python3.10/site-packages/doc_builder/build_doc.py", line 392, in build_doc
    anchors_mapping, source_files_mapping = build_mdx_files(
  File "/home/runner/work/transformers/transformers/.venv/lib/python3.10/site-packages/doc_builder/build_doc.py", line 255, in build_mdx_files
    raise type(e)(f"There was an error when converting {file} to the MDX format.\n" + e.args[0]) from e
ValueError: There was an error when converting ../transformers/docs/source/en/model_doc/lightonocr.md to the MDX format.
Unable to find LightOnOCRTextConfig in transformers. Make sure the path to that object is correct.

It is likely caused by the missing config object in __all__? Other than that all looks good,cc @Cyrilvallez for core review as this is mergeable soon!

molbap avatar Nov 12 '25 09:11 molbap

hey hey, there's just this

Traceback (most recent call last):
  File "/home/runner/work/transformers/transformers/.venv/bin/doc-builder", line 10, in <module>
    sys.exit(main())
  File "/home/runner/work/transformers/transformers/.venv/lib/python3.10/site-packages/doc_builder/commands/doc_builder_cli.py", line 50, in main
    args.func(args)
  File "/home/runner/work/transformers/transformers/.venv/lib/python3.10/site-packages/doc_builder/commands/build.py", line 103, in build_command
    build_doc(
  File "/home/runner/work/transformers/transformers/.venv/lib/python3.10/site-packages/doc_builder/build_doc.py", line 392, in build_doc
    anchors_mapping, source_files_mapping = build_mdx_files(
  File "/home/runner/work/transformers/transformers/.venv/lib/python3.10/site-packages/doc_builder/build_doc.py", line 255, in build_mdx_files
    raise type(e)(f"There was an error when converting {file} to the MDX format.\n" + e.args[0]) from e
ValueError: There was an error when converting ../transformers/docs/source/en/model_doc/lightonocr.md to the MDX format.
Unable to find LightOnOCRTextConfig in transformers. Make sure the path to that object is correct.

It is likely caused by the missing config object in __all__? Other than that all looks good,cc @Cyrilvallez for core review as this is mergeable soon!

Hi i added missing configs to all and i rebased :) I hope it will solve the issue !

baptiste-aubertin avatar Nov 12 '25 11:11 baptiste-aubertin

Hi I rebased and adapted the tied weight to the new format :) I also removed _init_weights methods who triggered test fail and seemed to be useless

baptiste-aubertin avatar Nov 18 '25 12:11 baptiste-aubertin

A test is not passing but it not seems to be related with our model. Could you please run the slow tests @molbap ?

baptiste-aubertin avatar Nov 18 '25 13:11 baptiste-aubertin

run-slow: lightonocr

molbap avatar Nov 18 '25 15:11 molbap

This comment contains run-slow, running the specified jobs:

models: ["models/lightonocr"] quantizations: []

github-actions[bot] avatar Nov 18 '25 16:11 github-actions[bot]

CI Results

Workflow Run ⚙️

Model CI Report

❌ Failed tests

  • lightonocr: tests/models/lightonocr/test_modeling_lightonocr.py::LightOnOCRForConditionalGenerationModelTest::test_eager_padding_matches_padding_free_with_position_ids tests/models/lightonocr/test_modeling_lightonocr.py::LightOnOCRForConditionalGenerationIntegrationTest::test_model_can_generate_without_images tests/models/lightonocr/test_modeling_lightonocr.py::LightOnOCRForConditionalGenerationIntegrationTest::test_model_forward_with_images

github-actions[bot] avatar Nov 18 '25 16:11 github-actions[bot]

There's just these:

 FAILED tests/models/lightonocr/test_modeling_lightonocr.py::LightOnOCRForConditionalGenerationModelTest::test_model_parallelism - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
FAILED tests/models/lightonocr/test_modeling_lightonocr.py::LightOnOCRForConditionalGenerationIntegrationTest::test_model_can_generate_without_images - AttributeError: 'LightOnOCRConfig' object has no attribute 'vocab_size'
FAILED tests/models/lightonocr/test_modeling_lightonocr.py::LightOnOCRForConditionalGenerationIntegrationTest::test_model_forward_with_images - AttributeError: 'LightOnOCRConfig' object has no attribute 'vocab_size'

For the last two - thought it was solved? but for the first one it's an "actual" issue, although you should be able to fix it by either creating the tensor in the same device as the first one, or just make sure you don't have any lingering .to(device) as device_map="auto" should be able to handle the device placement!

molbap avatar Nov 18 '25 16:11 molbap

hi @molbap , i fixed the tests. Can you rerun the slow one, the device test needs to be run on several gpus :)

baptiste-aubertin avatar Nov 20 '25 18:11 baptiste-aubertin

I re rabased and I still have tests not related to our model who are failing :/

baptiste-aubertin avatar Nov 21 '25 10:11 baptiste-aubertin

hi I rabased this morning and all the tests are now passing 👍 😄

baptiste-aubertin avatar Nov 24 '25 10:11 baptiste-aubertin

We have had very good results fine-tuning on LightOnOCR, and are looking forward to this being merged so we can use it in production.

eivtho avatar Dec 03 '25 09:12 eivtho

cc @molbap @zucchini-nlp, feel free to tag me when you believe this is ready for final review!

Cyrilvallez avatar Dec 08 '25 11:12 Cyrilvallez

Hi @Cyrilvallez @molbap , I re rebased today. All the tests seems to pass on my side, can you trigger the slow tests ? :)

baptiste-aubertin avatar Dec 08 '25 19:12 baptiste-aubertin

run-slow: lightonocr

zucchini-nlp avatar Dec 09 '25 10:12 zucchini-nlp