multistream-cnn icon indicating copy to clipboard operation
multistream-cnn copied to clipboard

Question about the tensor size between single-stream and multi-stream

Open kaiyikang opened this issue 3 years ago • 0 comments

Thanks for your gorgeous work again; it is a very impressive result.

When I read the script "run_multistream_cnn_1a.sh", I have a question about the size.

The lines 144-150 show the single-stream and the last one is:

conv-relu-batchnorm-layer name=cnn5 $cnn_opts height-in=10 height-out=10 time-offsets=-1,0,1 height-offsets=-1,0,1 num-filters-out=256

I imagine that the size of the output should be [length_of_seq, height, num_filters] (assume batch size = 1). A spectrum is like a image: length_of_image = based on real case, height = 10, num_filters=256.

Next step, the output is imported in multi-stream( lines 152~207), the first line of this part:

relu-batchnorm-dropout-layer name=tdnn6a $affine_opts input=cnn5 dim=512

It looks like the affine transformation occurs here, and [length_of_seq, 10, 256] is affined to [length_of_seq, 10, 512]. The remaining part would always follow the dim=512.

Am I right? Thanks so much.

kaiyikang avatar Aug 02 '21 09:08 kaiyikang