deep-action-proposals
deep-action-proposals copied to clipboard
DAPs paper understanding problem
I don't understand how much proposals are generated in all in one video.During training, only one stream is processed in one video,so the overall number of proposals is K.Is that true?
Thanks for your interest in our work.
K
is the nunmber of proposals after reasoning about a segment of length T
. In DAPs, we slide the model and applied on multiple chunks of length T
. If you need more proposals, you can reduce the striding. Effectively, you are still processing the video stream only once.
If I slide the model for n times, there will be n*K anchor segments matching the prior K anchor segments?
I'm sorry for your confusion.
The anchors spanned by the model belongs to the time interval T
. If you slide it by delta
, the following K
predictions will be in the interval [delta, T+delta]
. In other word, the anchors are parametrized in terms of T.
Please, take a look at the inference code. It's simple to understand there.
Hello @escorciav , I 'm curious about how you prepare your training data. In the network, the structure is pre-defined (the output dimension is always K for giving K proposals). However, during the training, for a audio clip of T, what if the number of activities is less than K? How do you assign the ground truth label? Thanks a lot. And this work is really interesting.