HOI-CL icon indicating copy to clipboard operation
HOI-CL copied to clipboard

object label?

Open lumiaomiao opened this issue 3 years ago • 8 comments

Hi, could you explain the * in Table3 in ATL? You described it as "* means we only use the boxes of the detection results", but how do you use the category of the detection results in training phrase and inference phrase ?

lumiaomiao avatar Oct 14 '21 08:10 lumiaomiao

Sorry for getting confusing you.

The object detection results provide both object category information and bounding boxes. Here, we only use the bounding boxes for inferring the HOI category. The training phase is the same as the previous setting. In fact, * means we use the same model as ATL, but do not use the object category information during inference.

feel free to contact me if you have further question,

Regards,

zhihou7 avatar Oct 14 '21 08:10 zhihou7

Thank you for your replay.

lumiaomiao avatar Oct 18 '21 12:10 lumiaomiao

@zhihou7 Hi, I have another question about the code. The function get_new_Trainval_N in lib/ult/ult.py is definied as : image

Why use " Trainval_N[4]" not " Trainval_N[k]" ?

lumiaomiao avatar Oct 20 '21 06:10 lumiaomiao

Thanks for your comment. It should be Tranval_N[k]. It is a bug from the code of VCL. I forget to update the code. After fixing this bug, the performance will be improved a bit. This bug also does not add seen classes for zero-shot setting. Therefore, it just affects the performance a bit.

I have updated the code.

Thanks.

zhihou7 avatar Oct 20 '21 06:10 zhihou7

Thank you for your quick reply.

lumiaomiao avatar Oct 20 '21 06:10 lumiaomiao

@zhihou7 As following codes, if an image contains two pairs <h1, v1, o1>, <h1, v2, o1> , and the first one is in the unseen composition list, then you delete two pair from training data. Why don't you only delete the first one ? In my view, only deleting the first one is more close to your description in paper. image

lumiaomiao avatar Oct 21 '21 02:10 lumiaomiao

Here, GT[1] is HOI label list of a HOI sample, e.g., [eat apple, hold apple]. If "eat apple" is unseen category. I think it is fair to remove this HOI sample, rather than remove the annotation [eat apple]. Otherwise, the sample of "eat apple" is still existing, but is not labeled, which I think is different from the setting of zero-shot.

zhihou7 avatar Oct 21 '21 02:10 zhihou7

I get it, thank you.

lumiaomiao avatar Oct 21 '21 03:10 lumiaomiao