video-classification icon indicating copy to clipboard operation
video-classification copied to clipboard

Tensor sizes input in ConvNet and RNN

Open AlexTS1980 opened this issue 4 years ago • 0 comments

Thanks for the code. I have a couple of questions regarding tensor sizes.

  1. The dataloader creates tensors size X= (#videos, #frames, 3, H, W) and y=(#videos, 1). There's a loop in the train method for #videos, but in my implementation it only returned index=0, so the input in the ConvNet is size (#videos, #frames, 3, H, W). Is this correct?

  2. In the ConvNet's forward method there's a loop for #frames in the video, it transforms the pool layer into a vector to get tensor (#videos, #frames, CNN_embed_dim), which is both the output of the ConvNet and input in the RNN. Is this right?

I don't quite understand how the RNN processes batch, i.e. the number of videos. Is there some internal loop for this that I can't find in the code?

AlexTS1980 avatar Jul 04 '19 15:07 AlexTS1980