wav2letter icon indicating copy to clipboard operation
wav2letter copied to clipboard

How to calculate 500ms_context from am_500ms_future_context.arch?

Open yuseungwoo opened this issue 4 years ago • 4 comments

Question

[A clear, concise description of your setup and question]

Thank you for reading my question in advance.

I have a question.

Paper "https://research.fb.com/wp-content/uploads/2020/01/Scaling-up-online-speech-recognition-using-ConvNets.pdf" and Example, https://github.com/flashlight/wav2letter/tree/master/recipes/streaming_convnets/librispeech

Above paper and example say that am_500ms_future_context.arch, this model has the 500ms future context.. but I don't understand why the model has 500ms_future_context.

Could you explain how the model has 500ms future context using above architecture, am_500ms_future_context.arch ?

Best Regrad

Seung Woo

Additional Context

[Add any additional information here]

yuseungwoo avatar Aug 19 '21 11:08 yuseungwoo

Hey,

You need to calculate what is the receptive field in your convolution network, so define which the future tokens / past tokens are used in the computations for particular output frame.

I believe in our code it was done automatically, as we define function for conv to compute its receptive field depending on the conv params and then propagate to the next layer. cc @vineelpratap if I am wrong.

tlikhomanenko avatar Aug 20 '21 16:08 tlikhomanenko

@tlikhomanenko Thank you for peaking up a good point. I calculated receptive field for one particular output frame. image Based on my math, it has about 1.5sec. receptive field. Is this related to 500ms anyhow?

airlab-byeol avatar Aug 23 '21 00:08 airlab-byeol

Dear @tlikhomanenko

Appreciate your contribution of this paper and answering to my question.

I'm so surprised with your work and studying your model, am_500ms_future_context.arch.

Especially, I'm interested in model diet and model inference speed

Here, I want to ask you something.

According to your paper, future context 250msec quite as good as 500msec arch. However I can't find it.

Where can I find this model or could you provide this?

Sincerly

Seung Woo

yuseungwoo avatar Aug 23 '21 06:08 yuseungwoo

@tlikhomanenko Thank you for peaking up a good point. I calculated receptive field for one particular output frame. image Based on my math, it has about 1.5sec. receptive field. Is this related to 500ms anyhow?

Hi @airlab-byeol, would you mind explaining clearly how did you come up with the figure? I feel like it is really close to the answer but don't understand why did you use 100 frames as input. Thank you.

nguyenhuy1209 avatar Nov 15 '21 02:11 nguyenhuy1209