lingvo
lingvo copied to clipboard
ASR Frame Size
Hi! In librispeech.py, I found some notes about default input_size:
# Data consists 240 dimensional frames (80 x 3 frames), which we
# re-interpret as individual 80 dimensional frames. See also,
# LibrispeechCommonAsrInputParams.
ep.input_shape = [None, None, 80, 1]
I have failed to found the code to re-interpret 80 x 3 frames into 80 dimensional frames. I am confused about the real frame-size, 240 or 80 ?
in spectrum_augmenter.py, there is a line:
p.Define('stack_height', 3, 'Number of frames stacked on top of each other')
are the above two "3" related? Should I change 'stack_height' to '1' when 'ep.input_shape = [None, None, 80, 1]'?
It's an unfortunate gotcha in the code. The frontend is configured to generate frames of size 240, by taking the concatenation of 3 frames of size 80 at a time. However, the model uses frames of size 80, at 3x the frame rate. So, ignore the 240 number. It's really 80.
On Thu, Jul 4, 2019 at 6:12 PM ColainCYY [email protected] wrote:
Hi! In librispeech.py, I found some notes about default input_size:
Data consists 240 dimensional frames (80 x 3 frames), which we
re-interpret as individual 80 dimensional frames. See also,
LibrispeechCommonAsrInputParams.
ep.input_shape = [None, None, 80, 1]
I have failed to found the code to re-interpret 80 x 3 frames into 80 dimensional frames. I am confused about the real frame-size, 240 or 80 ?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tensorflow/lingvo/issues/119?email_source=notifications&email_token=AE75E3KUOJRBXOYH5LFW2HDP5XEJDA5CNFSM4H5XKYU2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G5K4K4A, or mute the thread https://github.com/notifications/unsubscribe-auth/AE75E3KWFVUPUKG4VUYNFGLP5XEJDANCNFSM4H5XKYUQ .
Thanks very much! and should I remain 'stack_height' as '3'?
@drpngx So low frame rate
on input frames?
We generate low frame rates in the frontend, because it was unimplemented, but the model actually works on the regular framerate.
On Sat, Jul 13, 2019 at 4:05 PM Hui Zhang [email protected] wrote:
@drpngx https://github.com/drpngx So low frame rate on input frames?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/lingvo/issues/119?email_source=notifications&email_token=AE75E3KHA5ZOTDEK63R5GNTP7GLFTA5CNFSM4H5XKYU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ3NO6A#issuecomment-511104888, or mute the thread https://github.com/notifications/unsubscribe-auth/AE75E3NCDX3KSLWSRTJCZJTP7GLFTANCNFSM4H5XKYUQ .
We generate low frame rates in the frontend
I see.