multistream-cnn
multistream-cnn copied to clipboard
Question about the tensor size between single-stream and multi-stream
Thanks for your gorgeous work again; it is a very impressive result.
When I read the script "run_multistream_cnn_1a.sh", I have a question about the size.
The lines 144-150 show the single-stream and the last one is:
conv-relu-batchnorm-layer name=cnn5 $cnn_opts height-in=10 height-out=10 time-offsets=-1,0,1 height-offsets=-1,0,1 num-filters-out=256
I imagine that the size of the output should be [length_of_seq, height, num_filters] (assume batch size = 1). A spectrum is like a image: length_of_image = based on real case, height = 10, num_filters=256.
Next step, the output is imported in multi-stream( lines 152~207), the first line of this part:
relu-batchnorm-dropout-layer name=tdnn6a $affine_opts input=cnn5 dim=512
It looks like the affine transformation occurs here, and [length_of_seq, 10, 256] is affined to [length_of_seq, 10, 512]. The remaining part would always follow the dim=512.
Am I right? Thanks so much.