asr-server icon indicating copy to clipboard operation
asr-server copied to clipboard

I am not getting any text for decoding

Open viju2008 opened this issue 8 years ago • 14 comments

I have followed the steps given

However i always get the following output from the asr server

{"status":"ok","data":[{"confidence":0.862751,"text":""}],"interrupted":"endofspeech","time":1080}

Please guide on how to check the asr logs

viju2008 avatar Dec 17 '17 17:12 viju2008

Sometimes i get only text as NO

viju2008 avatar Dec 17 '17 17:12 viju2008

I think you might be seeing the same problem that I posted about in #31 If I switch in a model that I built in January, the recognition is great. With the latest Kaldi I get nothing but [NOISE] tokens I posted a question to Kaldi-help https://groups.google.com/forum/#!topic/kaldi-help/1N4aVb75IdU but DP did not have any ideas

mikenewman1 avatar Dec 19 '17 19:12 mikenewman1

I found the problem. In order to run with the latest (batchnorm) models you need to add a line after loading

    {
      bool binary;
      kaldi::Input ki(nnet3_rxfilename_, &binary);
      trans_model_->Read(ki.Stream(), binary);
      nnet_->Read(ki.Stream(), binary);

      // This is the crucial line
      SetBatchnormTestMode(true, &(nnet_->GetNnet()));
}

Note that this only affects newer models (built using Kaldi source from after about March 2017) For full compatability with the latest Kaldi, these two are probably a good idea as well:

      SetDropoutTestMode(true, &(nnet_->GetNnet()));
      kaldi::nnet3::CollapseModel(kaldi::nnet3::CollapseModelConfig(), &(nnet_->GetNnet()));

This is shamelessly lifted from (eg) kaldi/src/online2bin/online2-wav-nnet3-latgen-faster.cc

mikenewman1 avatar Jan 24 '18 16:01 mikenewman1

I put some details on this same issue on https://github.com/dialogflow/asr-server/issues/37 for what helped me get over this "issue."

formigone avatar Aug 30 '18 18:08 formigone

in which file do we add this line SetBatchnormTestMode(true, &(nnet_->GetNnet()));

dpny518 avatar Oct 23 '18 00:10 dpny518

In Nnet3LatgenFasterDecoder.cc

(in the function Nnet3LatgenFasterDecoder::Initialize)

mikenewman1 avatar Oct 23 '18 12:10 mikenewman1

@viju2008 I am in the same situation now. did you solve the problem?

hc038 avatar Nov 11 '20 11:11 hc038

See the posts above. The code needed updating to support batchnorm. After this fix everything worked fine. Note however that I haven't used this code in years so it may be broken again.

mikenewman1 avatar Nov 11 '20 11:11 mikenewman1

Sorry. You could try asking in the usual Kaldi help channel

From: hc038 [email protected] Reply-To: dialogflow/asr-server [email protected] Date: Wednesday, November 11, 2020 at 6:49 AM To: dialogflow/asr-server [email protected] Cc: "Mike Newman (SM)" [email protected], Mention [email protected] Subject: Re: [dialogflow/asr-server] I am not getting any text for decoding (#32)

@mikenewman1https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmikenewman1&data=04%7C01%7CMike.Newman%40microsoft.com%7C1bd19712d8fd4136732608d88637c887%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637406921437794104%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=haF058T30dz%2B%2BpZw%2B6zYcmzh%2FcL1NgTEDmlJgWPPvVc%3D&reserved=0 thanks for the quick reply, I have added that line to Nnet3LatgenFasterDecoder.cc but I am getting this error

[Image removed by sender. image]https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuser-images.githubusercontent.com%2F73985177%2F98808303-e1850f00-2441-11eb-8876-38d61e92a838.png&data=04%7C01%7CMike.Newman%40microsoft.com%7C1bd19712d8fd4136732608d88637c887%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637406921437804103%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=wYwInNZGZ8AkKNZAiNFy%2FAS2PTst6jWBSRCdyHPD1l0%3D&reserved=0

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdialogflow%2Fasr-server%2Fissues%2F32%23issuecomment-725378875&data=04%7C01%7CMike.Newman%40microsoft.com%7C1bd19712d8fd4136732608d88637c887%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637406921437814099%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=H386stw5ruT5p9IStk5n7xDFbApqIniHGp5EJ6MBrts%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FADDS3TZTLTWARJVHVUK44V3SPJ223ANCNFSM4EIS5VYQ&data=04%7C01%7CMike.Newman%40microsoft.com%7C1bd19712d8fd4136732608d88637c887%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637406921437824091%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4ktn1JzFYDD1%2Bhd9EaL2EYtkbo8eEntcL8RqjjcdoDM%3D&reserved=0.

mikenewman1 avatar Nov 11 '20 14:11 mikenewman1

I am trying to do with the system mic(Recognition using web browser), does it automatically convert to 16000hz audio format?

hc038 avatar Nov 12 '20 07:11 hc038

Javascript code downsamples browser input to 16000 https://github.com/dialogflow/asr-server/blob/master/asr-html/res/recorderWorker.js#L70

realill avatar Nov 12 '20 17:11 realill

thanks Ilya.

hc038 avatar Nov 13 '20 03:11 hc038

This server is working fine with "curl" command but with "system mic(Recognition using web browser)" I only get this image any suggestions?

hc038 avatar Nov 13 '20 06:11 hc038

  • Ensure browser records data correctly, I believe there is a way to import recorded stream.
  • Use Chrome Developer Console to debug javascript.
  • I do not remember if javascript client uses multi-part to send data to server, but this maybe a difference between curl and javascript.
  • You can emulate multi-part data sending with curl as well and see if it works.

By the end of the day if curl works you can write your own code to emulate what it does. But without multi-part you wont be able to productionize it very well. Multi-part allows to do "online" decoding where stream is decoded as you speak. So you better figure it out. ;)

realill avatar Nov 13 '20 16:11 realill