wit
wit copied to clipboard
POST request voice - Response only interpreting only first few words in wav file.
I think there is a bug with utterances that are longer than 2-3 words. When sending a mediaBlob well below 10 seconds in audio/wav file, spoken clear English, the response I get seems to be that only the first two words have registered. The audio file itself is in full as I can listen the entire five seconds, whereas Wit's response seem like to analyze only the first second.
here is what the response looks like:
{
"text": "One"
}
{
"text": "One,"
}
{
"text": "One, two,"
}
{
"entities": {
"wit_actions:wit_actions": [
{
"body": "chat",
"confidence": 0.999,
"end": 15,
"entities": [],
"id": "4570193529751160",
"name": "wit_actions",
"role": "wit_actions",
"start": 10,
"suggested": true,
"type": "value",
"value": "chat"
}
]
},
"intents": [
{
"confidence": 0.9988,
"id": "970864790488880",
"name": "wit_command_speak"
}
],
"text": "One, two, chat.",
"traits": {
"wit$sentiment": [
{
"confidence": 0.7213,
"id": "5ac2b50a-44e4-466e-9d49-bad6bd40092c",
"value": "neutral"
}
]
}
} string
Here's the request and header snippet for the above:
I am expecting wit to actually interpret the entire sentence, rather than just the first words. I uttered much more than "one, two". Appreciate your help!
APP ID 751387866245902
Hi @glitjch, this is likely due to ASR endpointing. ASR endpointing detects silences to reduce latency. Is the issue happening consistently for you? Can you try speaking faster to see if you get a longer transcript, to confirm the hypothesis?
I'm having the same issue, is there a way to configure these cutoff filters? I can confirm that if you speak faster with no hesitancy then the transcript is longer. Seems to be an issue also if one elongates the pronunciation of a word as often naturally happens in speech.
Closing due to no movement on the issue. Please re-open or file a new task should the issue be persisting.