segmentation_models.pytorch icon indicating copy to clipboard operation
segmentation_models.pytorch copied to clipboard

Add EoMT from ViT is Secretly an Image Segmentation Model

Open tcourat opened this issue 8 months ago • 2 comments

Hi, here to share a new image segmentation paper using ViT !

Paper : https://arxiv.org/abs/2503.19108 Code : https://github.com/tue-mps/eomt

This papers reach almost SOTA result with considerably less complex architectures (vision transformer only), if they are already well pretrained. EoMT only uses the architecture of the plain ViT with a few extra learned queries and a small mask prediction module. It works on par with ViT-Adapter + Mask2Former while being much less complex.

It would be interesting to have in this library !

tcourat avatar Apr 19 '25 11:04 tcourat

Hey @tcourat, indeed, super nice work!

I would be very happy to have it in the library, however, I have some concerns. It's an instance/panoptic segmentation model and it would be the first model of such a class. So it may not be straightforward to add it with training because the Matcher and loss need to be defined, and the training architecture is a bit different from the inference one.

However, in case anyone is happy to challenge themselves, I would greatly appreciate it and would help with the integration!

qubvel avatar Apr 19 '25 22:04 qubvel

@qubvel They have semantic segmentation results in their readme: https://github.com/tue-mps/eomt?tab=readme-ov-file#semantic-segmentation

ogencoglu avatar Jun 10 '25 11:06 ogencoglu