CATER
CATER copied to clipboard
Understanding TSN setup
In table 3 on paper, apparently you used 1 or 3 frames for TSN experiments. What does it mean? Why did you train using 3 frames and test using 250 frames? The task 3 is really challenging and it doesn't make sense to solve it using only 3 frames. I must have been mistaken about the setup. Does it mean you sampled 3 frames per segment? Then how many segments are used and how many total frames are seen on training time?
Also, what is the detailed setup for the TSN+LSTM? It appears that you used 10 clips for the LSTM on 3D models, but using TSN did you still use 10 "frames"? Or how did you set it up for the TSN?
Lastly, do you have any plan for releasing the TSN code?
Thanks a lot for your awesome research!!
Hi @kiyoon, thanks for your interest.
- For TSN I used 1 frame each from 3 segments. However at test time you can still run the network on any number of uniformly sampled frames.
- For TSN, I use the 250 segments at test time (as also used for average pooling)
- Unfortunately I've been a bit swamped so was only able to release the R3D part since that performed better anyway. However I'll keep it in mind if I get a chance.