wit
wit copied to clipboard
Reconstruct audio time with tokens returned from dictation endpoint
Do you want to request a feature, report a bug, or ask a question about wit? Question
What is the current behavior?
If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem.
What is the expected behavior?
If applicable, what is the App ID where you are experiencing this issue? If you do not provide this, we cannot help.
Hi everyone, first of all, I would like to thank you y`all for the amazing work with the Wit service.
I have a question regarding the tokens returned from the new /dictation endpoint:
Short version
is it possible to precisely reconstruct the audio time length using the timecode of the tokens?
Longer version:
prior to the /dictation
endpoint we used the /speech
endpoint and sent chunks of approximately 20s of a longer audio (split on silence). To keep track of the audio time as the transcriptions proceeds, we use the following equation:
Bytes Per Second (bps) = Sample Rate (Hz) * Word Length (bits) * Channel Count * 0.125
Which tells us the interval of the chunk transcribed. Now, using the /dictation
endpoint, we are trying to use the token's timecode to reconstruct the same interval, but the values do not match.
Is there something we need to consider in this reconstruction using the token's timecode?
I am sending an example of the response which also includes the time interval obtained with the equation. It is possible to notice that the total time does not match. The sum of tokens is 16320 (16.32s)
, while the chunk sent is 16.5s
long. It may seem a small difference, but the cumulative sum of all chunks is enough to mismatch the text with the audio.
{'end': 16.5, # the length of the chunk sent to /dictation endpoint calculated with the equation
'text': 'Tá, vamos Ponto. Quanto não tem dimensão? Isso não é uma definição, mas é uma característica dele',
'start': 0.0,
'tokens': [{'tokens': [{'end': 0, 'start': 0, 'token': ''},
{'end': 5520, 'start': 4520, 'token': 'Tá,'},
{'end': 6240, 'start': 5520, 'token': 'vamos'},
{'end': 6240, 'start': 6240, 'token': ''}],
'confidence': 0.8972},
{'tokens': [{'end': 7800, 'start': 7800, 'token': ''},
{'end': 10560, 'start': 9560, 'token': 'Ponto.'},
{'end': 10920, 'start': 10560, 'token': ''}],
'confidence': 0.7612},
{'tokens': [{'end': 11700, 'start': 11700, 'token': ''},
{'end': 13320, 'start': 12320, 'token': 'Quanto'},
{'end': 13500, 'start': 13320, 'token': 'não'},
{'end': 13620, 'start': 13500, 'token': 'tem'},
{'end': 14100, 'start': 13620, 'token': 'dimensão?'},
{'end': 14400, 'start': 14100, 'token': 'Isso'},
{'end': 14580, 'start': 14400, 'token': 'não'},
{'end': 14640, 'start': 14580, 'token': 'é'},
{'end': 14760, 'start': 14640, 'token': 'uma'},
{'end': 15120, 'start': 14760, 'token': 'definição,'},
{'end': 15300, 'start': 15120, 'token': 'mas'},
{'end': 15480, 'start': 15300, 'token': 'é'},
{'end': 15540, 'start': 15480, 'token': 'uma'},
{'end': 16020, 'start': 15540, 'token': 'característica'},
{'end': 16320, 'start': 16020, 'token': 'dele'},
{'end': 16320, 'start': 16320, 'token': ''}],
'confidence': 0.8018}]}
Can you please tell me how you got those text in Portuguese? because I am only getting in English and don't know how to change output language.
@andysagar it`s a configuration set in the wit.ai platform while creating a new app. There is a dropdown menu with the languages
Closing due to no movement on the issue. Please re-open or file a new task should the issue be persisting.