wit POST request voice - Response only interpreting only first few words in wav file.

POST request voice - Response only interpreting only first few words in wav file.

Open glitjch opened this issue 2 years ago • 1 comments

I think there is a bug with utterances that are longer than 2-3 words. When sending a mediaBlob well below 10 seconds in audio/wav file, spoken clear English, the response I get seems to be that only the first two words have registered. The audio file itself is in full as I can listen the entire five seconds, whereas Wit's response seem like to analyze only the first second.

here is what the response looks like:

{
  "text": "One"
}
{
  "text": "One,"
}
{
  "text": "One, two,"
}
{
  "entities": {
    "wit_actions:wit_actions": [
      {
        "body": "chat",
        "confidence": 0.999,
        "end": 15,
        "entities": [],
        "id": "4570193529751160",
        "name": "wit_actions",
        "role": "wit_actions",
        "start": 10,
        "suggested": true,
        "type": "value",
        "value": "chat"
      }
    ]
  },
  "intents": [
    {
      "confidence": 0.9988,
      "id": "970864790488880",
      "name": "wit_command_speak"
    }
  ],
  "text": "One, two, chat.",
  "traits": {
    "wit$sentiment": [
      {
        "confidence": 0.7213,
        "id": "5ac2b50a-44e4-466e-9d49-bad6bd40092c",
        "value": "neutral"
      }
    ]
  }
} string

Here's the request and header snippet for the above: Screen Shot 2022-03-22 at 6 06 54 PM

I am expecting wit to actually interpret the entire sentence, rather than just the first words. I uttered much more than "one, two". Appreciate your help!

APP ID 751387866245902

Mar 23 '22 01:03 glitjch

Hi @glitjch, this is likely due to ASR endpointing. ASR endpointing detects silences to reduce latency. Is the issue happening consistently for you? Can you try speaking faster to see if you get a longer transcript, to confirm the hypothesis?

Mar 23 '22 17:03 patapizza

I'm having the same issue, is there a way to configure these cutoff filters? I can confirm that if you speak faster with no hesitancy then the transcript is longer. Seems to be an issue also if one elongates the pronunciation of a word as often naturally happens in speech.

Jan 17 '23 22:01 bixxibix

Closing due to no movement on the issue. Please re-open or file a new task should the issue be persisting.

Apr 18 '23 09:04 Barbog

wit wit copied to clipboard

POST request voice - Response only interpreting only first few words in wav file.

wit
wit copied to clipboard