segmentation_models.pytorch Add EoMT from ViT is Secretly an Image Segmentation Model

Hi, here to share a new image segmentation paper using ViT !

Paper : https://arxiv.org/abs/2503.19108 Code : https://github.com/tue-mps/eomt

This papers reach almost SOTA result with considerably less complex architectures (vision transformer only), if they are already well pretrained. EoMT only uses the architecture of the plain ViT with a few extra learned queries and a small mask prediction module. It works on par with ViT-Adapter + Mask2Former while being much less complex.

It would be interesting to have in this library !

Apr 19 '25 11:04 tcourat

Hey @tcourat, indeed, super nice work!

I would be very happy to have it in the library, however, I have some concerns. It's an instance/panoptic segmentation model and it would be the first model of such a class. So it may not be straightforward to add it with training because the Matcher and loss need to be defined, and the training architecture is a bit different from the inference one.

However, in case anyone is happy to challenge themselves, I would greatly appreciate it and would help with the integration!

Apr 19 '25 22:04 qubvel

@qubvel They have semantic segmentation results in their readme: https://github.com/tue-mps/eomt?tab=readme-ov-file#semantic-segmentation

Jun 10 '25 11:06 ogencoglu