QuartzNet-ASR-pytorch Why using separable conv in C1 and C2 instead of normal conv1d?

Why using separable conv in C1 and C2 instead of normal conv1d?

Open mohsen-goodarzi opened this issue 3 years ago • 3 comments

Thank you for sharing your great work.

I noticed you have used sepconv_bn in C1 and C2 instead of conv_bn_act. Is it on purpose? Does it give better results?

https://github.com/Kirili4ik/QuartzNet-ASR-pytorch/blob/ec6073ef76d1ce0419bc62065ec746cb12a63efc/model.py#L49

Sep 01 '22 07:09 mohsen-goodarzi

Hi, Separable convolutions is a trick described in the paper of QuartzNet. Shortly, it uses less parameters achieving pretty same results (so it makes the model smaller and faster for on-device inference)

Sep 01 '22 07:09 Kirili4ik

I see. 👍 I thought they just used separable conv in B blocks. Thanks for fast reply.

Sep 01 '22 09:09 mohsen-goodarzi

As far as I remember, it can be unclear in the paper about the blocks where sepconvs are used. But we have tried to fully reproduce the paper and the number of the parameters of the model is known. If I remember correctly we tried using sepconvs everywhere to get the same number of the parameters as described in the paper and it worked.

Sep 01 '22 09:09 Kirili4ik

QuartzNet-ASR-pytorch QuartzNet-ASR-pytorch copied to clipboard

Why using separable conv in C1 and C2 instead of normal conv1d?

QuartzNet-ASR-pytorch
QuartzNet-ASR-pytorch copied to clipboard