LOVEU-CVPR2021 icon indicating copy to clipboard operation
LOVEU-CVPR2021 copied to clipboard

Details about the frame generation

Open lyyang01 opened this issue 4 years ago • 7 comments
trafficstars

Hello! thanks for your great job and the code. I am learning a lot. However, I still have some confusions about the generation of video frames. The video features are provided in your work. I noted that the feature length is 40 for each video, does this means that you generate 40 frames for every video and extract their features with SF or TSN?

lyyang01 avatar Jul 20 '21 08:07 lyyang01

Hi! Thank you for your interest.

Yse, we made 40 frames for every video using SF and TSN.

FYI, we had each feature frame represent 0.25 seconds so that the whole 40 features represent 10seconds. For those videos of length less than 10 seconds were also processed into 40 frames feature video with padded frames. For instance, we treat the 5-second-long video with 20 frames of video features and 20 frames of paddings.

I hope it will help you understand.

Best, Jinwoo Kim

2021년 7월 20일 (화) 오후 5:43, lyyang @.***>님이 작성:

Hello! thanks for your great job and the code. I am learning a lot. However, I still have some confusions about the generation of video frames. The video features are provided in your work. I noted that the feature length is 40 for each video, does this means that you generate 40 frames for every video and extract their features with SF or TSN?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hello-jinwoo/LOVEU-CVPR2021/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOBIIB3JF5RIXIOTSR3YBEDTYUZLXANCNFSM5AVKW6YQ .

hello-jinwoo avatar Jul 20 '21 13:07 hello-jinwoo

Hi @hello-jinwoo, Thanks for your reply. But I have a question now. How did you pretrain TSP features on ActivityNet? Could you share the details with us?

guuzaa avatar Aug 31 '21 08:08 guuzaa

Hi, thank you for your interest on our work.

We used the TSP network of R(2+1)34 pre-trained on ANet by the original author. You can find the weight here.

pplntech avatar Sep 02 '21 02:09 pplntech

Thanks for your reply. I will check this link soon.

guuzaa avatar Sep 02 '21 06:09 guuzaa

Hi @pplntech and @hello-jinwoo, re your comment about each feature frame representing 0.25 seconds. How is this possible considering the original pre-trained slowfast R50 model is trained with 2 second clips? I'm assuming you used 2 second input clips with a 0.25 second sliding window across the 10 second video, can you confirm that's correct?

tullie avatar Oct 09 '21 16:10 tullie

Hi @hello-jinwoo, Thanks for sharing the code. I have some questions about the difference between the SF_TSN_interpolated feature and SF_TSN_padded feature. Looking forward for your reply.

sqiangcao99 avatar Apr 04 '22 02:04 sqiangcao99

Hi.

Thanks for your attention.

We interpolate the shorter-than-10-second video features or pad with zeros for making them 10 secs (in our case, 40frames).

Best regard, Jinwoo Kim


Jinwoo Kim M.S. Student, Dept. of Computer Science, Yonsei University

On Apr 4, 2022, at 11:32 AM, sqiangcao99 @.***> wrote:

Hi @hello-jinwoo https://github.com/hello-jinwoo, Thanks for sharing the code. I have some questions about the difference between the SF_TSN_interpolated feature and SF_TSN_padded feature. Looking forward for your reply.

— Reply to this email directly, view it on GitHub https://github.com/hello-jinwoo/LOVEU-CVPR2021/issues/1#issuecomment-1087048444, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOBIIB6FZTKGDL2ZLCRDRM3VDJICBANCNFSM5AVKW6YQ. You are receiving this because you were mentioned.

hello-jinwoo avatar Apr 07 '22 03:04 hello-jinwoo