CATER icon indicating copy to clipboard operation
CATER copied to clipboard

Actions per frame

Open Rajawat23 opened this issue 4 years ago • 3 comments
trafficstars

Hi Authors, Thanks for your work. I tried generating 3 videos to test the dataset. while actions_order_dataset seems to return frame, label and classes, The output file (train.txt) under folder action_order_uniq contains no information about it.

It contains input something like /images/CLEVR_new_000002.avi 53,54,60,69,70,71,72,74,77,78,81,83,129,138,144,153,155,156,157,161,162,165,167,173,179,187,188,195,197,198,200,203,204,207,209,257,263,264,265,270,272,279,281,282,284,287,288,291,292,293,381,382,383,387,389,390,392,396,398,405,407,408,410,411,412,413,414,415,417,419,423,425,430,431,432,434,438,440,447,449,450,452,455,456,459,460,461,465,471,474,480,489,490,491,492,495,497,498,501,502,509,515,518,524,532,533,536,539,545,549,551,555,557,558,560,564,565,566,573,575,576,577,578,580,581,582,585,586,587

How can i get frame by frame actions and classes?

Rajawat23 avatar Feb 09 '21 11:02 Rajawat23

Hi, thanks for your interest. The actions_order task is a multi-label classification task where we pre-define a set of action order classes and the list that you see is the indices of classes that are active at some point in the video.

To get actions active at any given frame, you should be able to use the movements metadata, like this.

rohitgirdhar avatar Feb 16 '21 17:02 rohitgirdhar

Am I right in the assumption that the whole 10s video is the input and the whole list of classes is the label? I.e. the output is a 301 length vector describing whether this class was present at any time in the 10s video.

Ramtin-Nouri avatar Feb 02 '23 19:02 Ramtin-Nouri

Yes that is correct.

rohitgirdhar avatar Feb 03 '23 21:02 rohitgirdhar