st-resnet
st-resnet copied to clipboard
the dimension of temporal filtering with feature identity
In your paper, xl+1 = xl ∗ Wˆ l + bl , where the biases bl are initialized as 0 and Wˆ l ∈ 1× 1× T× Cl × Cl are temporal filters with weights initialized by stacking identity mappings between feature channels, 1 ∈ R1× 1× 1× Cl × Cl , across time t = 1 . . . T. I can not understand why the number of the dimension of "1" and W is five?
in addition to this question,xl is WHC,but wl is 11TCC, how the convolution of xl and wl is calculated. thank u for your reply.
Maybe the author has made a mistake that the dimension should be 4-d(whT*c)?
Maybe the author has made a mistake that the dimension should be 4-d(w_h_T*c)?
In the trained model, the dimension of the temporal filter is "1× T× Cl × Cl", but I can not understand.
got it! The 11T filter deal with the temporal information of T windows. And the C*C filter was used as an identity mapping of channel-level, for example, we use [1,0,0];[0,1,0];[0,0,1] to deal with feature maps of 3 channels.