Driver-Intention-Prediction Question about Conv-layer and Conv

We are running main_inside.py and main_outside.py, and I want to know if the content related to Conv_layer in main_outside.py is the same as the picture.

If not, achieving a size of 64×37×59 after passing through 3x112x176 is impossible with 3x3 conv layer with stride 1 and 2x2 Max pool with stride 2.

Please provide an answer. Thank you.

Dec 22 '23 06:12 jihwan722

I mean conv_layers in the classifier in the below picture Is that in the code you provided?

Dec 22 '23 07:12 jihwan722

Dear @jihwan722, Before the decoder, there is a convLSTM encoder. Table II refers to the output after the encoder. Did you include it in the pipeline?

Dec 25 '23 06:12 yaorong0921

I mean achieving a size of 64×37×59(Conv Block 0's output size) after passing through 32x112x176 is impossible with 3x3 conv layer with stride 1 and 2x2 Max pool with stride 2. Please provide your answer for this.

Dec 26 '23 04:12 jihwan722

@jihwan722 , please use a 3x3 conv layer with stride 3 to shrink the input size from 112x176 to 37x59 first. Code about classifier is not included in main_outside.py.

Dec 27 '23 02:12 yaorong0921

Thanks for your reply!! I want to ask more about fusion two modules which are inside module and outside module.

First, the inside images are 25fps and outside images are 30fps. Plus, when we make trainloader we use training_data from dataset.py, but training_data_inside's has 1 n_samples per each video and training_data_outside's has 10 n_samples per each video.

So, we want to fusion two modules but number of data is different.

Can we get the classifier module code? [email protected]

How can we combine two moduels?

Dec 27 '23 03:12 jihwan722

@jihwan722 , please use a 3x3 conv layer with stride 3 to shrink the input size from 112x176 to 37x59 first. Code about classifier is not included in main_outside.py.

After we use a 3x3 conv layer with stride 3 without pooling layer, we use 3x3 conv layer with stride 1 and 2x2 max pooling with stride 2. Then we get the 17x28. It's different with the size in the paper 12x20. Please let me know the exact method. @yaorong0921

Dec 29 '23 06:12 johook

Dear @johook, could you get 37x59 first after the 3x3 conv layer with the stride of 3? Do you mean that you could not get the dimension given in Table 2?

Jan 01 '24 10:01 yaorong0921

Dear @jihwan722 , When fusing, the inside and outside videos are given into two different networks, which might take different numbers of frames. As given in Section IV-B (outside video), the input frames are all included in the time period before the second T using an interval L=5. In IV-C (in-cabin video), a 16-frame clip is given before the second T. We can understand how the classifier works as follows: each branch takes different frames before the second T to compute a feature; Then, two features are fused together, as described in Table 2.

Jan 01 '24 10:01 yaorong0921

Question about Conv-layer and Conv_Block