deep-action-proposals DAPs paper understanding problem

DAPs paper understanding problem

Open zycironic opened this issue 6 years ago • 4 comments

I don't understand how much proposals are generated in all in one video.During training, only one stream is processed in one video,so the overall number of proposals is K.Is that true?

May 07 '18 07:05 zycironic

Thanks for your interest in our work.

K is the nunmber of proposals after reasoning about a segment of length T. In DAPs, we slide the model and applied on multiple chunks of length T. If you need more proposals, you can reduce the striding. Effectively, you are still processing the video stream only once.

May 07 '18 09:05 escorciav

If I slide the model for n times, there will be n*K anchor segments matching the prior K anchor segments?

May 07 '18 10:05 zycironic

I'm sorry for your confusion.

The anchors spanned by the model belongs to the time interval T. If you slide it by delta, the following K predictions will be in the interval [delta, T+delta]. In other word, the anchors are parametrized in terms of T.

Please, take a look at the inference code. It's simple to understand there.

May 08 '18 11:05 escorciav

Hello @escorciav , I 'm curious about how you prepare your training data. In the network, the structure is pre-defined (the output dimension is always K for giving K proposals). However, during the training, for a audio clip of T, what if the number of activities is less than K? How do you assign the ground truth label? Thanks a lot. And this work is really interesting.

Jul 23 '18 22:07 TianweiXing

deep-action-proposals deep-action-proposals copied to clipboard

DAPs paper understanding problem

deep-action-proposals
deep-action-proposals copied to clipboard