HOI-Learning-List
HOI-Learning-List copied to clipboard
A list of Human-Object Interaction Learning.
HOI-Learning-List
Some recent (2015-now) Human-Object Interaction Learing studies. If you find any errors or problems, please feel free to comment.
A list of Transfomer-based vision works: https://github.com/DirtyHarryLYL/Transformer-in-Vision.
Dataset/Benchmark
-
HOI-COCO (CVPR2021) [Website]
-
PaStaNet-HOI (TPAMI2021) [Benchmark]
-
HAKE (CVPR2020) [YouTube] [bilibili] [Website] [Paper] [HAKE-Action-Torch] [HAKE-Action-TF]
-
PIC [Website]
More...
Video HOI Datasets
-
VidHOI [Paper]
-
AVA [Website], HOIs (human-object, human-human) and pose (body motion) actions
-
Action Genome [Website], action verbs and spatial relationships
Method
HOI Image Generation
-
Exploiting Relationship for Complex-scene Image Generation (arXiv 2021.04) [Paper]
-
Specifying Object Attributes and Relations in Interactive Scene Generation (arXiv 2019.11) [Paper]
HOI Recognition: Image-based, to recognize all the HOIs in one image.
-
DEFR (arXiv 2021.12) [Paper]
-
Interaction Compass (ICCV 2021) [Paper]
-
DEFR-CLIP (arXiv 2021.07) [Paper]
-
PaStaNet: Toward Human Activity Knowledge Engine (CVPR2020) [Code] [Data] [Paper] [YouTube] [bilibili]
-
Pairwise (ECCV2018) [Paper]
-
Attentional Pooling for Action Recognition (NIPS2017) [Code] [Paper]
-
Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering (ECCV2016) [Code] [Paper]
-
Contextual Action Recognition with R*CNN (ICCV2015) [Code] [Paper]
-
SGAP-Net (AAAI2020) [Paper]
More...
Unseen or zero-shot learning (image-level recognition).
-
Compositional Learning for Human Object Interaction (ECCV2018) [Paper]
-
Zero-Shot Human-Object Interaction Recognition via Affordance Graphs (Sep. 2020) [Paper]
More...
HOI Detection: Instance-based, to detect the human-object pairs and classify the interactions.
-
K-BAN (arXiv 2022), [Paper]
-
SGCN4HOI (arXiv 2022), [Paper]
-
ODM (arXiv 2022), [Paper]
-
SDT (arXiv 2022), [Paper]
-
STIP (CVPR 2022), [Paper]
-
DT (CVPR 2022), [Paper]
-
CATN (arXiv 2022), [Paper]
-
SSRT (CVPR 2022), [Paper]
-
MSTR (CVPR 2022), [Paper]
-
Iwin (arXiv 2022.3), [Paper]
-
RGBM (arXiv 2022.2), [Paper]
-
PhraseHOI (AAAI 2022) [Paper]
-
DEFR (arXiv 2021.12) [Paper]
-
HRNet (TIP 2021) [Paper]
-
SG2HOI (ICCV 2021) [Paper]
-
HOI-MO-Net (IVC 2021) [Paper]
-
IPGN (TIP 2021.7) [Paper]
-
Human Object Interaction Detection using Two-Direction Spatial Enhancement and Exclusive Object Prior (arXiv) [Paper]
-
PST (ICCV2021) [Paper]
-
RR-Net (arXiv 2021.5) [Paper]
-
End-to-End Human Object Interaction Detection with HOI Transformer (CVPR2021), [Paper], [Code]
-
DIRV (AAAI2021) [Paper]
-
DecAug (AAAI2021) [Paper]
-
OSGNet (IEEE Access) [Paper]
-
PFNet (CVM) [Paper]
-
UniDet (ECCV2020) [Paper]
-
FCMNet (ECCV2020) [Paper]
-
Contextual Heterogeneous Graph Network for Human-Object Interaction Detection (ECCV2020) [Paper]
-
ConsNet (ACMMM2020) [Paper] [Code], HICO-DET Python API: A general Python toolkit for the HICO-DET dataset, including APIs for data loading & processing, human-object pair IoU & NMS calculation, and standard evaluation. [Code] [Documentation]
-
Action-Guided Attention Mining and Relation Reasoning Network for Human-Object Interaction Detection (IJCAI2020) [Paper]
-
PaStaNet (CVPR2020) [Code] [Data] [Paper] [YouTube] [bilibili]
-
Cascaded Human-Object Interaction Recognition (CVPR2020) [Code] [Paper]
-
Diagnosing Rarity in Human-Object Interaction Detection (CVPRW2020) [Paper]
-
MLCNet (ICMR2020) [Paper]
-
SIGN (ICME2020) [Paper]
-
In-GraphNet (IJCAI-PRICAI 2020) [Paper]
-
RPNN (ICCV2019) [Paper]
-
Deep Contextual Attention for Human-Object Interaction Detection (ICCV2019) [Paper]
-
Turbo (AAAI2019) [Paper]
-
InteractNet (CVPR2018) [Paper]
-
Scaling Human-Object Interaction Recognition through Zero-Shot Learning (WACV2018) [Paper]
-
VS-GATs (Mar. 2020) [Paper]
-
Classifying All Interacting Pairs in a Single Shot (Jan. 2020) [Paper]
-
Novel Human-Object Interaction Detection via Adversarial Domain Generalization (May. 2020) [Paper]
-
SABRA (Dec 2020) [Paper]
More...
Unseen or zero/low-shot or weakly-supervised learning (instance-level detection).
-
Align-Former (BMVC 2021), [Paper]
-
Discovering Human Interactions with Large-Vocabulary Objects via Query and Multi-Scale Detection (ICCV2021) [Paper], [Code]
-
DGIG-Net (TOC2021) [Paper]
-
Detecting Human-Object Interaction with Mixed Supervision (WACV 2021) [Paper]
-
Zero-Shot Human-Object Interaction Recognition via Affordance Graphs (Sep. 2020) [Paper]
-
Novel Human-Object Interaction Detection via Adversarial Domain Generalization (May. 2020) [Paper]
-
Functional (AAAI2020) [Paper]
-
Scaling Human-Object Interaction Recognition through Zero-Shot Learning (WACV2018) [Paper]
More...
Video HOI methods
-
SPDTP (arXiv, Jun 2022), [Paper]
-
V-HOI (arXiv, Jun 2022), [Paper]
-
Detecting Human-Object Relationships in Videos (ICCV2021) [Paper]
-
VidHOI (May 2021), [Paper]
-
Generating Videos of Zero-Shot Compositions of Actions and Objects (Jul 2020), HOI GAN, [Paper]
-
Grounded Human-Object Interaction Hotspots from Video (ICCV2019) [Code] [Paper]
More...
Result
PaStaNet-HOI:
Proposed by TIN (TPAMI version, Transferable Interactiveness Network). It is built on HAKE data, includes 110K+ images and 520 HOIs (without the 80 "no_interaction" HOIs of HICO-DET to avoid the incomplete labeling). It has a more severe long-tailed data distribution thus is more difficult.
Detector: COCO pre-trained
Method | mAP |
---|---|
iCAN | 11.00 |
iCAN+NIS | 13.13 |
TIN | 15.38 |
HICO-DET:
1) Detector: COCO pre-trained
Method | Pub | Full(def) | Rare(def) | None-Rare(def) | Full(ko) | Rare(ko) | None-Rare(ko) |
---|---|---|---|---|---|---|---|
Shen et al. | WACV2018 | 6.46 | 4.24 | 7.12 | - | - | - |
HO-RCNN | WACV2018 | 7.81 | 5.37 | 8.54 | 10.41 | 8.94 | 10.85 |
InteractNet | CVPR2018 | 9.94 | 7.16 | 10.77 | - | - | - |
Turbo | AAAI2019 | 11.40 | 7.30 | 12.60 | - | - | - |
GPNN | ECCV2018 | 13.11 | 9.34 | 14.23 | - | - | - |
Xu et. al | ICCV2019 | 14.70 | 13.26 | 15.13 | - | - | - |
iCAN | BMVC2018 | 14.84 | 10.45 | 16.15 | 16.26 | 11.33 | 17.73 |
Wang et. al. | ICCV2019 | 16.24 | 11.16 | 17.75 | 17.73 | 12.78 | 19.21 |
Lin et. al | IJCAI2020 | 16.63 | 11.30 | 18.22 | 19.22 | 14.56 | 20.61 |
Functional (suppl) | AAAI2020 | 16.96 | 11.73 | 18.52 | - | - | - |
Interactiveness | CVPR2019 | 17.03 | 13.42 | 18.11 | 19.17 | 15.51 | 20.26 |
No-Frills | ICCV2019 | 17.18 | 12.17 | 18.68 | - | - | - |
RPNN | ICCV2019 | 17.35 | 12.78 | 18.71 | - | - | - |
PMFNet | ICCV2019 | 17.46 | 15.65 | 18.00 | 20.34 | 17.47 | 21.20 |
SIGN | ICME2020 | 17.51 | 15.31 | 18.53 | 20.49 | 17.53 | 21.51 |
Interactiveness-optimized | CVPR2019 | 17.54 | 13.80 | 18.65 | 19.75 | 15.70 | 20.96 |
Liu et.al. | arXiv | 17.55 | 20.61 | - | - | - | - |
Wang et al. | ECCV2020 | 17.57 | 16.85 | 17.78 | 21.00 | 20.74 | 21.08 |
In-GraphNet | IJCAI-PRICAI 2020 | 17.72 | 12.93 | 19.31 | - | - | - |
HOID | CVPR2020 | 17.85 | 12.85 | 19.34 | - | - | - |
MLCNet | ICMR2020 | 17.95 | 16.62 | 18.35 | 22.28 | 20.73 | 22.74 |
SAG | arXiv | 18.26 | 13.40 | 19.71 | - | - | - |
Sarullo et al. | arXiv | 18.74 | - | - | - | - | - |
DRG | ECCV2020 | 19.26 | 17.74 | 19.71 | 23.40 | 21.75 | 23.89 |
Analogy | ICCV2019 | 19.40 | 14.60 | 20.90 | - | - | - |
VCL | ECCV2020 | 19.43 | 16.55 | 20.29 | 22.00 | 19.09 | 22.87 |
VS-GATs | arXiv | 19.66 | 15.79 | 20.81 | - | - | - |
VSGNet | CVPR2020 | 19.80 | 16.05 | 20.91 | - | - | - |
PFNet | CVM | 20.05 | 16.66 | 21.07 | 24.01 | 21.09 | 24.89 |
ATL(w/ COCO) | CVPR2021 | 20.08 | 15.57 | 21.43 | - | - | - |
FCMNet | ECCV2020 | 20.41 | 17.34 | 21.56 | 22.04 | 18.97 | 23.12 |
ACP | ECCV2020 | 20.59 | 15.92 | 21.98 | - | - | - |
PD-Net | ECCV2020 | 20.81 | 15.90 | 22.28 | 24.78 | 18.88 | 26.54 |
SG2HOI | ICCV2021 | 20.93 | 18.24 | 21.78 | 24.83 | 20.52 | 25.32 |
TIN-PAMI | TAPMI2021 | 20.93 | 18.95 | 21.32 | 23.02 | 20.96 | 23.42 |
ATL | CVPR2021 | 21.07 | 16.79 | 22.35 | - | - | - |
PMN | arXiv | 21.21 | 17.60 | 22.29 | - | - | - |
IPGN | TIP2021 | 21.26 | 18.47 | 22.07 | - | - | - |
DJ-RN | CVPR2020 | 21.34 | 18.53 | 22.18 | 23.69 | 20.64 | 24.60 |
OSGNet | IEEE Access | 21.40 | 18.12 | 22.38 | - | - | - |
K-BAN | arXiv2022 | 21.48 | 16.85 | 22.86 | 24.29 | 19.09 | 25.85 |
SCG+ODM | arXiv | 21.50 | 17.59 | 22.67 | - | - | - |
DIRV | AAAI2021 | 21.78 | 16.38 | 23.39 | 25.52 | 20.84 | 26.92 |
SCG | ICCV2021 | 21.85 | 18.11 | 22.97 | - | - | - |
HRNet | TIP2021 | 21.93 | 16.30 | 23.62 | 25.22 | 18.75 | 27.15 |
ConsNet | ACMMM2020 | 22.15 | 17.55 | 23.52 | 26.57 | 20.8 | 28.3 |
IDN | NeurIPS2020 | 23.36 | 22.47 | 23.63 | 26.43 | 25.01 | 26.85 |
QAHOI-Res50 | arXiv2021 | 24.35 | 16.18 | 26.80 | - | - | - |
DOQ | CVPR2022 | 25.97 | 26.09 | 25.93 | - | - | - |
STIP | CVPR2022 | 28.81 | 27.55 | 29.18 | 32.28 | 31.07 | 32.64 |
2) Detector: pre-trained on COCO, fine-tuned on HICO-DET train set (with GT human-object pair boxes) or one-stage detector (point-based, transformer-based)
Finetuned detector would learn to only detect the interactive humans and objects (with interactiveness), thus suppress many wrong pairings (non-interactive human-object pairs) and boost the performance.
Method | Pub | Full(def) | Rare(def) | None-Rare(def) | Full(ko) | Rare(ko) | None-Rare(ko) |
---|---|---|---|---|---|---|---|
UniDet | ECCV2020 | 17.58 | 11.72 | 19.33 | 19.76 | 14.68 | 21.27 |
IP-Net | CVPR2020 | 19.56 | 12.79 | 21.58 | 22.05 | 15.77 | 23.92 |
RR-Net | arXiv | 20.72 | 13.21 | 22.97 | - | - | - |
PPDM (paper) | CVPR2020 | 21.10 | 14.46 | 23.09 | - | - | - |
PPDM (github-hourglass104) | CVPR2020 | 21.73/21.94 | 13.78/13.97 | 24.10/24.32 | 24.58/24.81 | 16.65/17.09 | 26.84/27.12 |
Functional | AAAI2020 | 21.96 | 16.43 | 23.62 | - | - | - |
SABRA-Res50 | arXiv | 23.48 | 16.39 | 25.59 | 28.79 | 22.75 | 30.54 |
VCL | ECCV2020 | 23.63 | 17.21 | 25.55 | 25.98 | 19.12 | 28.03 |
ATL | CVPR2021 | 23.67 | 17.64 | 25.47 | 26.01 | 19.60 | 27.93 |
PST | ICCV2021 | 23.93 | 14.98 | 26.60 | 26.42 | 17.61 | 29.05 |
SABRA-Res50FPN | arXiv | 24.12 | 15.91 | 26.57 | 29.65 | 22.92 | 31.65 |
ATL(w/ COCO) | CVPR2021 | 24.50 | 18.53 | 26.28 | 27.23 | 21.27 | 29.00 |
IDN | NeurIPS2020 | 24.58 | 20.33 | 25.86 | 27.89 | 23.64 | 29.16 |
FCL | CVPR2021 | 24.68 | 20.03 | 26.07 | 26.80 | 21.61 | 28.35 |
HOTR | CVPR2021 | 25.10 | 17.34 | 27.42 | - | - | - |
FCL+VCL | CVPR2021 | 25.27 | 20.57 | 26.67 | 27.71 | 22.34 | 28.93 |
OC-Immunity | AAAI2022 | 25.44 | 23.03 | 26.16 | 27.24 | 24.32 | 28.11 |
ConsNet-F | ACMMM2020 | 25.94 | 19.35 | 27.91 | 30.34 | 23.4 | 32.41 |
SABRA-Res152 | arXiv | 26.09 | 16.29 | 29.02 | 31.08 | 23.44 | 33.37 |
QAHOI-Res50 | arXiv2021 | 26.18 | 18.06 | 28.61 | - | - | - |
Zou et al. | CVPR2021 | 26.61 | 19.15 | 28.84 | 29.13 | 20.98 | 31.57 |
RGBM | arXiv2022 | 27.39 | 21.34 | 29.20 | 30.87 | 24.20 | 32.87 |
GTNet | arXiv | 28.03 | 22.73 | 29.61 | 29.98 | 24.13 | 31.73 |
K-BAN | arXiv2022 | 28.83 | 20.29 | 31.31 | 31.05 | 21.41 | 33.93 |
AS-Net | CVPR2021 | 28.87 | 24.25 | 30.25 | 31.74 | 27.07 | 33.14 |
QPIC-Res50 | CVPR2021 | 29.07 | 21.85 | 31.23 | 31.68 | 24.14 | 33.93 |
GGNet | CVPR2021 | 29.17 | 22.13 | 30.84 | 33.50 | 26.67 | 34.89 |
QPIC-CPC | CVPR2022 | 29.63 | 23.14 | 31.57 | - | - | - |
QPIC-Res101 | CVPR2021 | 29.90 | 23.92 | 31.69 | 32.38 | 26.06 | 34.27 |
SCG | ICCV2021 | 29.26 | 24.61 | 30.65 | 32.87 | 27.89 | 34.35 |
PhraseHOI | AAAI2022 | 30.03 | 23.48 | 31.99 | 33.74 | 27.35 | 35.64 |
MSTR | CVPR2022 | 31.17 | 25.31 | 32.92 | 34.02 | 28.83 | 35.57 |
SSRT | CVPR2022 | 31.34 | 24.31 | 33.32 | - | - | - |
OCN | AAAI2022 | 31.43 | 25.80 | 33.11 | 65.3 | 67.1 | |
SCG+ODM | arXiv | 31.65 | 24.95 | 33.65 | - | - | - |
DT | CVPR2022 | 31.75 | 27.45 | 33.03 | 34.50 | 30.13 | 35.81 |
CATN (w/ Bert) | arXiv2022 | 31.86 | 25.15 | 33.84 | 34.44 | 27.69 | 36.45 |
CDN | NeurIPS2021 | 32.07 | 27.19 | 33.53 | 34.79 | 29.48 | 36.38 |
STIP | CVPR2022 | 32.22 | 28.15 | 33.43 | 35.29 | 31.43 | 36.45 |
DEFR | arXiv2021 | 32.35 | 33.45 | 32.02 | - | - | - |
CDN-s+HQM | ECCV2022 | 32.47 | 28.15 | 33.76 | - | - | - |
UPT | arXiv2021 | 32.62 | 28.62 | 33.81 | 36.08 | 31.41 | 37.47 |
Iwin | arXiv2022 | 32.79 | 27.84 | 35.40 | 35.84 | 28.74 | 36.09 |
SDT | arXiv2022 | 32.97 | 28.49 | 34.31 | 36.32 | 31.90 | 37.64 |
DOQ | CVPR2022 | 33.28 | 29.19 | 34.50 | - | - | - |
IF | CVPR2022 | 33.51 | 30.30 | 34.46 | 36.28 | 33.16 | 37.21 |
GEN-VLKT (w/ CLIP) | CVPR2022 | 34.95 | 31.18 | 36.08 | 38.22 | 34.36 | 39.37 |
ParMap | ECCV2022 | 35.15 | 33.71 | 35.58 | 37.56 | 35.87 | 38.06 |
QAHOI-Swin-Large-ImageNet-22K | arXiv2021 | 35.78 | 29.80 | 37.56 | 37.59 | 31.66 | 39.36 |
3) Ground Truth human-object pair boxes (only evaluating HOI recognition)
Method | Pub | Full(def) | Rare(def) | None-Rare(def) |
---|---|---|---|---|
iCAN | BMVC2018 | 33.38 | 21.43 | 36.95 |
Interactiveness | CVPR2019 | 34.26 | 22.90 | 37.65 |
Analogy | ICCV2019 | 34.35 | 27.57 | 36.38 |
ATL | CVPR2021 | 43.32 | 33.84 | 46.15 |
IDN | NeurIPS2020 | 43.98 | 40.27 | 45.09 |
ATL(w/ COCO) | CVPR2021 | 44.27 | 35.52 | 46.89 |
FCL | CVPR2021 | 45.25 | 36.27 | 47.94 |
GTNet | arXiv | 46.45 | 35.10 | 49.84 |
SCG | ICCV2021 | 51.53 | 41.01 | 54.67 |
K-BAN | arXiv2022 | 52.99 | 34.91 | 58.40 |
ConsNet | ACMMM2020 | 53.04 | 38.79 | 57.3 |
4) Interactiveness detection (interactive or not + pair box detection):
Method | Pub | HICO-DET | V-COCO |
---|---|---|---|
TIN++ | TPAMI2022 | 14.35 | 29.36 |
PPDM | CVPR2020 | 27.34 | - |
QPIC | CVPR2021 | 32.96 | 38.33 |
CDN | NeurIPS2021 | 33.55 | 40.13 |
ParMap | ECCV2022 | 38.74 | 43.61 |
5) Enhanced with HAKE:
Method | Pub | Full(def) | Rare(def) | None-Rare(def) | Full(ko) | Rare(ko) | None-Rare(ko) |
---|---|---|---|---|---|---|---|
iCAN | BMVC2018 | 14.84 | 10.45 | 16.15 | 16.26 | 11.33 | 17.73 |
iCAN + HAKE-HICO-DET | CVPR2020 | 19.61 (+4.77) | 17.29 | 20.30 | 22.10 | 20.46 | 22.59 |
Interactiveness | CVPR2019 | 17.03 | 13.42 | 18.11 | 19.17 | 15.51 | 20.26 |
Interactiveness + HAKE-HICO-DET | CVPR2020 | 22.12 (+5.09) | 20.19 | 22.69 | 24.06 | 22.19 | 24.62 |
Interactiveness + HAKE-Large | CVPR2020 | 22.66 (+5.63) | 21.17 | 23.09 | 24.53 | 23.00 | 24.99 |
6) Zero-Shot HOI detection:
Unseen action-object combination scenario (UC)
Method | Pub | Detector | Unseen(def) | Seen(def) | Full(def) |
---|---|---|---|---|---|
Shen et al. | WACV2018 | COCO | 5.62 | - | 6.26 |
Functional | AAAI2020 | HICO-DET | 11.31 ± 1.03 | 12.74 ± 0.34 | 12.45 ± 0.16 |
ConsNet | ACMMM2020 | COCO | 16.99 ± 1.67 | 20.51 ± 0.62 | 19.81 ± 0.32 |
VCL (NF-UC) | ECCV2020 | HICO-DET | 16.22 | 18.52 | 18.06 |
ATL(w/ COCO) ((NF-UC)) | CVPR2021 | HICO-DET | 18.25 | 18.78 | 18.67 |
FCL (NF-UC) | CVPR2021 | HICO-DET | 18.66 | 19.55 | 19.37 |
SCL | arxiv | HICO-DET | 21.73 | 25.00 | 24.34 |
GEN-VLKT*(NF-UC) | CVPR2022 | HICO-DET | 25.05 | 23.38 | 23.71 |
VCL (RF-UC) | ECCV2020 | HICO-DET | 10.06 | 24.28 | 21.43 |
ATL(w/ COCO) ((RF-UC)) | CVPR2021 | HICO-DET | 9.18 | 24.67 | 21.57 |
FCL (RF-UC) | CVPR2021 | HICO-DET | 13.16 | 24.23 | 22.01 |
SCL(RF-UC) | arxiv | HICO-DET | 19.07 | 30.39 | 28.08 |
GEN-VLKT*(RF-UC) | CVPR2022 | HICO-DET | 21.36 | 32.91 | 30.56 |
- * indicates large Visual-Language model pretraining, \eg, CLIP.
- For the details of the setting, please refer to corresponding publications. This is not officially published and might miss some publications. Please find the corresponding publications.
Unseen object scenario (UO)
Method | Pub | Detector | Full(def) | Seen(def) | Unseen(def) |
---|---|---|---|---|---|
Functional | AAAI2020 | HICO-DET | 13.84 | 14.36 | 11.22 |
FCL | CVPR2021 | HICO-DET | 19.87 | 20.74 | 15.54 |
ConsNet | ACMMM2020 | COCO | 20.71 | 20.99 | 19.27 |
Unseen action scenario (UA)
Method | Pub | Detector | Full(def) | Seen(def) | Unseen(def) |
---|---|---|---|---|---|
ConsNet | ACMMM2020 | COCO | 19.04 | 20.02 | 14.12 |
Another setting
Method | Pub | Unseen | Seen | Full |
---|---|---|---|---|
Shen et. al. | WACV2018 | 5.62 | - | 6.26 |
Functional | AAAI2020 | 10.93 | 12.60 | 12.26 |
VCL | ECCV2020 | 10.06 | 24.28 | 21.43 |
ATL | CVPR2021 | 9.18 | 24.67 | 21.57 |
FCL | CVPR2021 | 13.16 | 24.23 | 22.01 |
THID (w/ CLIP) | CVPR2022 | 15.53 | 24.32 | 22.96 |
Ambiguous-HOI
Detector: COCO pre-trained
Method | mAP |
---|---|
iCAN | 8.14 |
Interactiveness | 8.22 |
Analogy(reproduced) | 9.72 |
DJ-RN | 10.37 |
OC-Immunity | 10.45 |
SWiG-HOI
Method | Pub | Non-Rare | Unseen | Seen | Full |
---|---|---|---|---|---|
JSR | ECCV2020 | 10.01 | 6.10 | 2.34 | 6.08 |
CHOID | ICCV2021 | 10.93 | 6.63 | 2.64 | 6.64 |
QPIC | CVPR2021 | 16.95 | 10.84 | 6.21 | 11.12 |
THID (w/ CLIP) | CVPR2022 | 17.67 | 12.82 | 10.04 | 13.26 |
V-COCO: Scenario1
1) Detector: COCO pre-trained or one-stage detector
Method | Pub | AP(role) |
---|---|---|
Gupta et al. | arXiv | 31.8 |
InteractNet | CVPR2018 | 40.0 |
Turbo | AAAI2019 | 42.0 |
GPNN | ECCV2018 | 44.0 |
iCAN | BMVC2018 | 45.3 |
Xu et. al | CVPR2019 | 45.9 |
Wang et. al. | ICCV2019 | 47.3 |
UniDet | ECCV2020 | 47.5 |
Interactiveness | CVPR2019 | 47.8 |
Lin et. al | IJCAI2020 | 48.1 |
VCL | ECCV2020 | 48.3 |
Zhou et. al. | CVPR2020 | 48.9 |
In-GraphNet | IJCAI-PRICAI 2020 | 48.9 |
Interactiveness-optimized | CVPR2019 | 49.0 |
TIN-PAMI | TAPMI2021 | 49.1 |
IP-Net | CVPR2020 | 51.0 |
DRG | ECCV2020 | 51.0 |
RGBM | arXiv2022 | 51.7 |
VSGNet | CVPR2020 | 51.8 |
PMN | arXiv | 51.8 |
PMFNet | ICCV2019 | 52.0 |
Liu et.al. | arXiv | 52.28 |
FCL | CVPR2021 | 52.35 |
PD-Net | ECCV2020 | 52.6 |
Wang et.al. | ECCV2020 | 52.7 |
PFNet | CVM | 52.8 |
Zou et al. | CVPR2021 | 52.9 |
SIGN | ICME2020 | 53.1 |
ACP | ECCV2020 | 52.98 (53.23) |
FCMNet | ECCV2020 | 53.1 |
HRNet | TIP2021 | 53.1 |
SGCN4HOI | arXiv2022 | 53.1 |
ConsNet | ACMMM2020 | 53.2 |
IDN | NeurIPS2020 | 53.3 |
SG2HOI | ICCV2021 | 53.3 |
OSGNet | IEEE Access | 53.43 |
SABRA-Res50 | arXiv | 53.57 |
K-BAN | arXiv2022 | 53.70 |
IPGN | TIP2021 | 53.79 |
AS-Net | CVPR2021 | 53.9 |
RR-Net | arXiv | 54.2 |
SCG | ICCV2021 | 54.2 |
SABRA-Res50FPN | arXiv | 54.69 |
GGNet | CVPR2021 | 54.7 |
MLCNet | ICMR2020 | 55.2 |
HOTR | CVPR2021 | 55.2 |
DIRV | AAAI2021 | 56.1 |
SABRA-Res152 | arXiv | 56.62 |
PhraseHOI | AAAI2022 | 57.4 |
GTNet | arXiv | 58.29 |
QPIC-Res101 | CVPR2021 | 58.3 |
QPIC-Res50 | CVPR2021 | 58.8 |
CATN (w/ fastText) | arXiv2022 | 60.1 |
Iwin | arXiv2022 | 60.85 |
UPT-ResNet-101-DC5 | arXiv2021 | 61.3 |
SDT | arXiv2022 | 61.8 |
MSTR | CVPR2022 | 62.0 |
IF | CVPR2022 | 63.0 |
ParMap | ECCV2022 | 63.0 |
QPIC-CPC | CVPR2022 | 63.1 |
DOQ | CVPR2022 | 63.5 |
GEN-VLKT (w/ CLIP) | CVPR2022 | 63.58 |
QPIC+HQM | ECCV2022 | 63.6 |
CDN | NeurIPS2021 | 63.91 |
SSRT | CVPR2022 | 65.0 |
STIP | CVPR2022 | 66.0 |
DT | CVPR2022 | 66.2 |
2) Enhanced with HAKE:
Method | Pub | AP(role) |
---|---|---|
iCAN | CVPR2019 | 45.3 |
iCAN + HAKE-Large (transfer learning) | CVPR2020 | 49.2 (+3.9) |
Interactiveness | CVPR2019 | 47.8 |
Interactiveness + HAKE-Large (transfer learning) | CVPR2020 | 51.0 (+3.2) |
HOI-COCO:
based on V-COCO
Method | Pub | Full | Seen | Unseen |
---|---|---|---|---|
VCL | ECCV2020 | 23.53 | 8.29 | 35.36 |
ATL(w/ COCO) | CVPR2021 | 23.40 | 8.01 | 35.34 |
HICO
1) Default
Method | mAP |
---|---|
R*CNN | 28.5 |
Girdhar et.al. | 34.6 |
Mallya et.al. | 36.1 |
Pairwise | 39.9 |
RelViT | 40.12 |
DEFR-base | 44.1 |
DEFR-CLIP | 60.5 |
DEFR/16 CLIP | 65.6 |
2) Enhanced with HAKE:
Method | mAP |
---|---|
Mallya et.al. | 36.1 |
Mallya et.al.+HAKE-HICO | 45.0 (+8.9) |
Pairwise | 39.9 |
Pairwise+HAKE-HICO | 45.9 (+6.0) |
Pairwise+HAKE-Large | 46.3 (+6.4) |