transformers icon indicating copy to clipboard operation
transformers copied to clipboard

[WIP] Add OmDet-Turbo

Open yonigozlan opened this issue 1 year ago • 4 comments

What does this PR do?

This PR adds support for OmDet-Turbo, an open-vocabulary detection model from Om Research Lab.

Who can review?

@amyeroberts @qubvel

yonigozlan avatar Jul 08 '24 15:07 yonigozlan

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Hey @amyeroberts @qubvel ! I think this PR is ready for a first review. What's missing for now is adding/modifying tests (current ones are for Grounding Dino and not adapted to this model). Short-terms TODOs left:

  • Tests
  • Docs, examples and some docstrings left to modify

Longer-terms TODOs:

  • Add support for training

yonigozlan avatar Jul 17 '24 22:07 yonigozlan

Hi @amyeroberts and @qubvel! When you have some time, could you please take another look at this PR? I've resolved your previous remarks and left the ones where I had questions open.

All the files I've modified are ready for review. I've commented out the modeling tests I haven't addressed yet (there's only one end-to-end integration test for now), but the processor tests are ready for review.

I'd like to highlight a significant issue: the OmDetTurboMultiheadAttention in the OmDetTurboEncoderLayer module. It reproduces the original model's behavior, but this one appears to be incorrectly implemented for batch inference. This issue makes the whole model unusable for batch inference if we want to use the original model weights and be consistent with the original model outputs.

Thanks in advance!

yonigozlan avatar Jul 22 '24 22:07 yonigozlan

Thanks for the review @qubvel ! For the OmdetTurboModel, the task specific part of the model starts at the very beginning of the decoder, where there are two heads defined, one for the object detection scores and the other for the bboxes coordinates. So it is quite difficult to define a non task-specific OmdetTurboModel. It wouldn't be the first model not to have a FooModel, for example LLaVaNext doesn't have one either, maybe for the same reason? @amyeroberts if you think not having OmdetTurboModel is ok, I think this should be ready for you to review! Sorry I probably pinged you for a final review too early before, but I'm hopeful it's better this time :)

yonigozlan avatar Aug 16 '24 18:08 yonigozlan