Awesome-Video-Instance-Segmentation
Awesome-Video-Instance-Segmentation copied to clipboard
Awesome video instance segmentation papers
Awesome-Video-Instance-Segmentation
The project is continuously updated, welcome to starts ⭐ & comments 💹 & sharing 😀 !!!
Other awesome projects: Awesome-Referring-Video-Object-Segmentation
2024
| Model | Title | Venue | Type | Paper | Code |
|---|---|---|---|---|---|
| OOKD | Offline-to-Online Knowledge Distillation for Video Instance Segmentation | WACV | Online | ||
| MobileInst | MobileInst: Video Instance Segmentation on the Mobile | AAAI | Online | ||
| LBVQ | Learning Better Video Query with SAM for Video Instance Segmentation | TCSVT | Offline | Code | |
| OMG-Seg | OMG-Seg: Is One Model Good Enough For All Segmentation? | CVPR | Semi-Online | Code | |
| UniVS | UniVS: Unified and Universal Video Segmentation with Prompts as Queries | CVPR | Online | Code | |
| GLEE | General Object Foundation Model for Images and Videos at Scale | CVPR | Offline | Code | |
| RAP-SAM | RAP-SAM : Towards Real-Time All-Purpose Segment Anything | Arxiv | Online | Code | |
| BriVIS | Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation | Arxiv | Offline | Code | |
| VISAGE | VISAGE: Video Instance Segmentation with Appearance-Guided Enhancement | Arxiv | Online | Code | |
| InstFormer | OpenVIS: Open-vocabulary Video Instance Segmentation | Arxiv | Online | ||
| CLIP-VIS | CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation | Arxiv | Online | Code | |
| DVIS-DAQ | DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries | Arxiv | Online/Offline | Code | |
| PointVIS | What is Point Supervision Worth in Video Instance Segmentation? | Arxiv | Online | ||
| OW-VISCap | OW-VISCap: Open-World Video Instance Segmentation and Captioning | Arxiv | Online | Code | |
| PM-VIS | PM-VIS: High-Performance Box-Supervised Video Instance Segmentation | Arxiv | Online |
2023
| Model | Title | Venue | Type | Paper | Code |
|---|---|---|---|---|---|
| InstanceFormer | InstanceFormer: An Online Video Instance Segmentation Framework | AAAI | Online | Code | |
| GenVIS | A Generalized Framework for Video Instance Segmentation | CVPR | Online/Semi-Online | Code | |
| MDQE | MDQE: Mining Discriminative Query Embeddings to Segment Occluded Instances on Challenging Videos | CVPR | Semi-Online | Code | |
| Mask-Free VIS | Mask-Free Video Instance Segmentation | CVPR | Online | Code | |
| InstMove | InstMove: Instance Motion for Object-centric Video Segmentation | CVPR | Online | Code | |
| VideoCutLER | VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation | CVPR | Offline | Code | |
| TarViS | TarViS: A Unified Approach for Target-based Video Segmentation | CVPR | Offline | Code | |
| CAROQ | Context-Aware Relative Object Queries To Unify Video Instance and Panoptic Segmentation | CVPR | Online | ||
| UNINEXT | Universal Instance Perception as Object Discovery and Retrieval | CVPR | Offline | Code | |
| CTVIS | CTVIS: Consistent Training for Online Video Instance Segmentation | ICCV | Online | Code | |
| DVIS | DVIS: Decoupled Video Instance Segmentation Framework | ICCV | Online/Offline | Code | |
| OV2Seg | Towards Open-Vocabulary Video Instance Segmentation | ICCV | Online | Code | |
| TCOVIS | TCOVIS: Temporally Consistent Online Video Instance Segmentation | ICCV | Online | Code | |
| Tube-Link | Tube-Link: A Flexible Cross Tube Baseline for Universal Video Segmentation | ICCV | Semi-Online | Code | |
| TMT-VIS | TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation | NeurIPS | Offline | Code | |
| NOVIS | NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation | ICML | Semi-Online | ||
| TIVE | TIVE: A Toolbox for Identifying Video Instance Segmentation Errors | Neurocomputing | Toolbox | Code | |
| VLKP | VLKP: Video Instance Segmentation with Visual-Linguistic Knowledge Prompts | ICASSP | Offline | ||
| IAST | IAST: Instance Association Relying on Spatio-Temporal Features for Video Instance Segmentation | ICASSP | Offline | Code | |
| HEVis* | Coarse-to-Fine Video Instance Segmentation With Factorized Conditional Appearance Flows | JAS | Offline | Code | |
| TAFormer | Towards Robust Video Instance Segmentation with Temporal-Aware Transformer | Arxiv | Offline | ||
| UVOSAM | UVOSAM: A Mask-free Paradigm for Unsupervised Video Object Segmentation via Segment Anything Model | Arxiv | Online | ||
| RefineVIS | RefineVIS: Video Instance Segmentation with Temporal Attention Refinement | Arxiv | Online | ||
| GRAtt-VIS | GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance Segmentation | Arxiv | Online | Code | |
| BoxVIS | BoxVIS: Video Instance Segmentation with Box Annotations | Arxiv | Online | Code | |
| OW-VISFormer | Video Instance Segmentation in an Open-World | Arxiv | Offline | Code | |
| DVIS++ | DVIS++: Improved Decoupled Framework for Universal Video Segmentation | Arxiv | Online/Offline | Code |
2022
| Model | Title | Venue | Type | Paper | Code |
|---|---|---|---|---|---|
| HIATF | Hybrid Instance-Aware Temporal Fusion for Online Video Instance Segmentation | AAAI | Online | ||
| Mask2former-VIS | Mask2former for Video Instance Segmentation | CVPR | Offline | Code | |
| Video K-Net | Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation | CVPR | Offline | Code | |
| VISOLO | VISOLO: Grid-Based Space-Time Aggregation for Efficient Online Video Instance Segmentation | CVPR | Online | Code | |
| TeViT | Temporally Efficient Vision Transformer for Video Instance Segmentation | CVPR | Offline | Code | |
| EfficientVIS | Efficient Video Instance Segmentation via Tracklet Query and Proposal | CVPR | Online | Code | |
| SeqFormer | SeqFormer: Sequential Transformer for Video Instance Segmentation | ECCV | Offline | Code | |
| IDOL | In Defense of Online Models for Video Instance Segmentation | ECCV | Online | Code | |
| MS-STS VIS | Video Instance Segmentation via Multi-scale Spatio-temporal Split Attention Transformer | ECCV | Offline | Code | |
| Self-Shot VIS | Less than Few: Self-Shot Video Instance Segmentation | ECCV | Offline | ||
| VMT | Video Mask Transfiner for High-Quality Video Instance Segmentation | ECCV | Offline | Code | |
| STC | STC: Spatio-Temporal Contrastive Learning for Video Instance Segmentation | ECCV | Online | ||
| IAI | Instance As Identity: A Generic Online Paradigm for Video Instance Segmentation | ECCV | Online | Code | |
| VITA | VITA: Video Instance Segmentation via Object Token Association | NeurIPS | Offline | Code | |
| MinVIS | MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training | NeurIPS | Online | Code | |
| InsPro | InsPro: Propagating Instance Query and Proposal for Online Video Instance Segmentation | NeurIPS | Online | ||
| SipMaskv2 | SipMaskv2: Enhanced Fast Image and Video Instance Segmentation | TPAMI | Online | Code | |
| TPR | Improving Video Instance Segmentation via Temporal Pyramid Routing | TPAMI | Online | Code | |
| IFA | Video Instance Segmentation by Instance Flow Assembly | TMM | Online | ||
| DefVIS | Deformable VisTR : Spatio temporal deformable attention for video instance segmentation | ICASSP | Offline | Code | |
| TBA | Tag-Based Attention Guided Bottom-Up Approach for Video Instance Segmentation | ICPR | Offline | ||
| DeVIS | DeVIS: Making Deformable Transformers Work for Video Instance Segmentation | Arxiv | Offline | Code | |
| RCF | Online Video Instance Segmentation via Robust Context Fusion | Arxiv | Online | ||
| IFR | Consistent Video Instance Segmentation with Inter-Frame Recurrent Attention | Arxiv | Offline | ||
| ROVIS | Robust Online Video Instance Segmentation with Track Queries | Arxiv | Online | Code | |
| CiCo | One-stage Video Instance Segmentation: From Frame-in Frame-out to Clip-in Clip-out | Arxiv | Offline | Code | |
| TLTM | Two-Level Temporal Relation Model for Online Video Instance Segmentation | Arxiv | Online | Code |
2021
| Model | Title | Venue | Type | Paper | Code |
|---|---|---|---|---|---|
| CompFeat | CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation | AAAI | Online | Code | |
| VisTR | End-to-End Video Instance Segmentation with Transformers | CVPR | Offline | Code | |
| SG-Net | SG-Net: Spatial Granularity Network for One-Stage Video Instance Segmentation | CVPR | Online | Code | |
| STMask | Spatial Feature Calibration and Temporal Fusion for Effective One-Stage Video Instance Segmentation | CVPR | Online | Code | |
| CrossVIS | Crossover Learning for Fast Online Video Instance Segmentation | ICCV | Online | Code | |
| Propose-Reduce | Video Instance Segmentation with a Propose-Reduce Paradigm | ICCV | Offline | Code | |
| VisSTG | End-to-end Video Instance Segmentation via Spatial-Temporal Graph Neural Networks | ICCV | Online | Code | |
| QueryInst | Instances as Queries | ICCV | Online | Code | |
| HEVis | Learning Hierarchical Embedding for Video Instance Segmentation | ACM MM | Offline | Code | |
| SRNet | SRNet: Spatial Relation Network for Efficient Single-stage Instance Segmentation in Videos | ACM MM | Online | ||
| IFC | Video Instance Segmentation using Inter-Frame Communication Transformers | NeurIPS | Offline | Code | |
| PCAN | Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation | NeurIPS | Online | Code | |
| CMaskTrack R-CNN | Occluded Video Instance Segmentation: A Benchmark | IJCV | Online | Dataset | |
| RGNNVIS++ | Recurrent Graph Neural Networks for Video Instance Segmentation | IJCV | Online | Code |
2020
| Model | Title | Venue | Type | Paper | Code |
|---|---|---|---|---|---|
| MaskProp | Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation | CVPR | Offline | ||
| VAE | Video Instance Segmentation Tracking with a Modified VAE Architecture | CVPR | Online | ||
| SipMask | Sipmask: Spatial Information Preservation for Fast Image and Video Instance Segmentation | ECCV | Online | Code | |
| STEm-Seg | STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos | ECCV | Offline | Code | |
| RGNNVIS | Learning Video Instance Segmentation with Recurrent Graph Neural Networks | GCPR | Online | Code |
2019
| Model | Title | Venue | Type | Paper | Code |
|---|---|---|---|---|---|
| MaskTrack R-CNN | Video instance segmentation | ICCV | Online | Code |