transformers
transformers copied to clipboard
Add ConvNeXt Mask R-CNN
What does this PR do?
This PR is an initial draft for implementing the classic Mask R-CNN framework with ConvNeXt as backbone.
The framework is implemented in a single script, with the exception of 3 files (for now):
- assign_result.py
- losses.py
- mask_target.py
As we have a one model, one file policy, I'm reimplementing ConvNeXT leveraging Copied from statements. So ConvNextMaskRCNNModel is almost identical to ConvNextModel. This way, the backbone used for object detection stays independent from the original one. In this case for instance, extra layernorms are added after each stage.
There's a dependency on torchvision, which is used for NMS (non-maximum suppression, a postprocessing algorithm used by both the RPN head and the RoI head).