lingvo ASR Frame Size

Hi! In librispeech.py, I found some notes about default input_size:

# Data consists 240 dimensional frames (80 x 3 frames), which we
# re-interpret as individual 80 dimensional frames. See also,
# LibrispeechCommonAsrInputParams.
ep.input_shape = [None, None, 80, 1]

I have failed to found the code to re-interpret 80 x 3 frames into 80 dimensional frames. I am confused about the real frame-size, 240 or 80 ?

in spectrum_augmenter.py, there is a line:

p.Define('stack_height', 3, 'Number of frames stacked on top of each other')

are the above two "3" related? Should I change 'stack_height' to '1' when 'ep.input_shape = [None, None, 80, 1]'?

Jul 04 '19 10:07 ColainCYY

It's an unfortunate gotcha in the code. The frontend is configured to generate frames of size 240, by taking the concatenation of 3 frames of size 80 at a time. However, the model uses frames of size 80, at 3x the frame rate. So, ignore the 240 number. It's really 80.

On Thu, Jul 4, 2019 at 6:12 PM ColainCYY [email protected] wrote:

Hi! In librispeech.py, I found some notes about default input_size:

Data consists 240 dimensional frames (80 x 3 frames), which we

re-interpret as individual 80 dimensional frames. See also,

LibrispeechCommonAsrInputParams.

ep.input_shape = [None, None, 80, 1]

I have failed to found the code to re-interpret 80 x 3 frames into 80 dimensional frames. I am confused about the real frame-size, 240 or 80 ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tensorflow/lingvo/issues/119?email_source=notifications&email_token=AE75E3KUOJRBXOYH5LFW2HDP5XEJDA5CNFSM4H5XKYU2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G5K4K4A, or mute the thread https://github.com/notifications/unsubscribe-auth/AE75E3KWFVUPUKG4VUYNFGLP5XEJDANCNFSM4H5XKYUQ .

Jul 04 '19 13:07 drpngx

Thanks very much! and should I remain 'stack_height' as '3'?

Jul 05 '19 01:07 ColainCYY

@drpngx So low frame rate on input frames?

Jul 13 '19 09:07 zh794390558

We generate low frame rates in the frontend, because it was unimplemented, but the model actually works on the regular framerate.

On Sat, Jul 13, 2019 at 4:05 PM Hui Zhang [email protected] wrote:

@drpngx https://github.com/drpngx So low frame rate on input frames?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/lingvo/issues/119?email_source=notifications&email_token=AE75E3KHA5ZOTDEK63R5GNTP7GLFTA5CNFSM4H5XKYU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ3NO6A#issuecomment-511104888, or mute the thread https://github.com/notifications/unsubscribe-auth/AE75E3NCDX3KSLWSRTJCZJTP7GLFTANCNFSM4H5XKYUQ .

Jul 13 '19 09:07 drpngx

We generate low frame rates in the frontend I see.

Jul 16 '19 03:07 zh794390558

lingvo lingvo copied to clipboard

ASR Frame Size

Hi! In librispeech.py, I found some notes about default input_size:

in spectrum_augmenter.py, there is a line:

Data consists 240 dimensional frames (80 x 3 frames), which we

re-interpret as individual 80 dimensional frames. See also,

LibrispeechCommonAsrInputParams.

lingvo
lingvo copied to clipboard