LOVEU-CVPR2021 Details about the frame generation

trafficstars

Hello! thanks for your great job and the code. I am learning a lot. However, I still have some confusions about the generation of video frames. The video features are provided in your work. I noted that the feature length is 40 for each video, does this means that you generate 40 frames for every video and extract their features with SF or TSN?

Jul 20 '21 08:07 lyyang01

Hi! Thank you for your interest.

Yse, we made 40 frames for every video using SF and TSN.

FYI, we had each feature frame represent 0.25 seconds so that the whole 40 features represent 10seconds. For those videos of length less than 10 seconds were also processed into 40 frames feature video with padded frames. For instance, we treat the 5-second-long video with 20 frames of video features and 20 frames of paddings.

I hope it will help you understand.

Best, Jinwoo Kim

2021년 7월 20일 (화) 오후 5:43, lyyang @.***>님이 작성:

Hello! thanks for your great job and the code. I am learning a lot. However, I still have some confusions about the generation of video frames. The video features are provided in your work. I noted that the feature length is 40 for each video, does this means that you generate 40 frames for every video and extract their features with SF or TSN?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hello-jinwoo/LOVEU-CVPR2021/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOBIIB3JF5RIXIOTSR3YBEDTYUZLXANCNFSM5AVKW6YQ .

Jul 20 '21 13:07 hello-jinwoo

Hi @hello-jinwoo, Thanks for your reply. But I have a question now. How did you pretrain TSP features on ActivityNet? Could you share the details with us?

Aug 31 '21 08:08 guuzaa

Hi, thank you for your interest on our work.

We used the TSP network of R(2+1)34 pre-trained on ANet by the original author. You can find the weight here.

Sep 02 '21 02:09 pplntech

Thanks for your reply. I will check this link soon.

Sep 02 '21 06:09 guuzaa

Hi @pplntech and @hello-jinwoo, re your comment about each feature frame representing 0.25 seconds. How is this possible considering the original pre-trained slowfast R50 model is trained with 2 second clips? I'm assuming you used 2 second input clips with a 0.25 second sliding window across the 10 second video, can you confirm that's correct?

Oct 09 '21 16:10 tullie

Hi @hello-jinwoo, Thanks for sharing the code. I have some questions about the difference between the SF_TSN_interpolated feature and SF_TSN_padded feature. Looking forward for your reply.

Apr 04 '22 02:04 sqiangcao99

Hi.

Thanks for your attention.

We interpolate the shorter-than-10-second video features or pad with zeros for making them 10 secs (in our case, 40frames).

Best regard, Jinwoo Kim

Jinwoo Kim M.S. Student, Dept. of Computer Science, Yonsei University

On Apr 4, 2022, at 11:32 AM, sqiangcao99 @.***> wrote:

Hi @hello-jinwoo https://github.com/hello-jinwoo, Thanks for sharing the code. I have some questions about the difference between the SF_TSN_interpolated feature and SF_TSN_padded feature. Looking forward for your reply.

— Reply to this email directly, view it on GitHub https://github.com/hello-jinwoo/LOVEU-CVPR2021/issues/1#issuecomment-1087048444, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOBIIB6FZTKGDL2ZLCRDRM3VDJICBANCNFSM5AVKW6YQ. You are receiving this because you were mentioned.

Apr 07 '22 03:04 hello-jinwoo

LOVEU-CVPR2021 LOVEU-CVPR2021 copied to clipboard

Details about the frame generation

LOVEU-CVPR2021
LOVEU-CVPR2021 copied to clipboard