[WIP] Add OmDet-Turbo
What does this PR do?
This PR adds support for OmDet-Turbo, an open-vocabulary detection model from Om Research Lab.
Who can review?
@amyeroberts @qubvel
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
Hey @amyeroberts @qubvel ! I think this PR is ready for a first review. What's missing for now is adding/modifying tests (current ones are for Grounding Dino and not adapted to this model). Short-terms TODOs left:
- Tests
- Docs, examples and some docstrings left to modify
Longer-terms TODOs:
- Add support for training
Hi @amyeroberts and @qubvel! When you have some time, could you please take another look at this PR? I've resolved your previous remarks and left the ones where I had questions open.
All the files I've modified are ready for review. I've commented out the modeling tests I haven't addressed yet (there's only one end-to-end integration test for now), but the processor tests are ready for review.
I'd like to highlight a significant issue: the OmDetTurboMultiheadAttention in the OmDetTurboEncoderLayer module. It reproduces the original model's behavior, but this one appears to be incorrectly implemented for batch inference. This issue makes the whole model unusable for batch inference if we want to use the original model weights and be consistent with the original model outputs.
Thanks in advance!
Thanks for the review @qubvel ! For the OmdetTurboModel, the task specific part of the model starts at the very beginning of the decoder, where there are two heads defined, one for the object detection scores and the other for the bboxes coordinates. So it is quite difficult to define a non task-specific OmdetTurboModel. It wouldn't be the first model not to have a FooModel, for example LLaVaNext doesn't have one either, maybe for the same reason?
@amyeroberts if you think not having OmdetTurboModel is ok, I think this should be ready for you to review! Sorry I probably pinged you for a final review too early before, but I'm hopeful it's better this time :)