gen-vlkt icon indicating copy to clipboard operation
gen-vlkt copied to clipboard

Zero-Shot HOI Detection Setting

Open hutuo1213 opened this issue 1 year ago • 0 comments

Hi, We have recently considered using a two-stage method for Zero-Shot HOI Detection, but we have some confusion about the detailed setup. My understanding is that the text features (600) do similarity calculations with the final interaction representation to get the interaction categories.

  1. unseen combination (UC) has seen all verbs and objects. The two-stage method seems to handle this case directly. Alternatively, the "similarity computation" is still used.

  2. Unseen Objects (UO) Have not seen a few objects. We found that in GEN-VLKT (hico_text_label.py), there are only interaction categories related to unseen objects. So, are unseen objects used as GTs in training for the object detection task? Does this part still use "similarity computation"?

  3. Unseen actions (UV) have not seen few verbs. Does this part still use "similarity computation"?

hutuo1213 avatar Nov 27 '23 15:11 hutuo1213