tubelet-transformer icon indicating copy to clipboard operation
tubelet-transformer copied to clipboard

Questions about the code for JHMDB

Open wenzhengzeng opened this issue 2 years ago • 4 comments

Thanks for the great work. I have read the code for JHMDB and have some questions: (1) The performance of [email protected] is just 0.72, much lower than the 82.3 that is reported. (2) I also notice that the provided evaluation code for JHMDB is for frame-mAP, rather than video-mAP, because the AP is calculated on frame-level rather than tubelet-level. (3) Although the query number is defined as 10*clip_len, only the predictions of the queries corresponding to the intermediate frame (key_pos) are extracted as the final prediction result during training and testing. In other words, such a pipeline is more like a video object detection where the input is a video clip but the goal is just to predict the object and its class in the middle frame of the input video. I did not find the place that can reveal the properties of the so called tubelet transformer. In summary, is some configurations wrong with the current code?

wenzhengzeng avatar Sep 03 '22 13:09 wenzhengzeng