vosk-api icon indicating copy to clipboard operation
vosk-api copied to clipboard

Use vosk_recognizer_accept_waveform_f interface but got bad result

Open liuweie opened this issue 1 year ago • 3 comments

Hi , I used C++ to call your compiled library in windows ,when I use the vosk_recognizer_accept_waveform interface(accept const char* data), the recognize result is perfect, but when I use vosk_recognizer_accept_waveform_f (accept const float * data), the result is not very accurate. So I wandering if there is a problem with how I use the vosk_recognizer_accept_waveform_f ??

here is my code : ` int vosk(std::string wavFile, std::string modelPath) {

  std::ifstream wavin(wavFile, std::ios::binary);
  char buf[48000];
  int final, nread;

  VoskModel* model = vosk_model_new(modelPath.data());
  VoskRecognizer* recognizer = vosk_recognizer_new(model, 16000);

  //wavin.seekg(44, std::ios::beg);
  while (!wavin.eof()) 
  {
      wavin.read(buf, sizeof(buf));
      nread = wavin.gcount();
      int flen = nread / 2;
      float floatBuf[48000];
      for (int i = 0; i < nread/2; i++) 
      {
          floatBuf[i] = static_cast<float>(reinterpret_cast<const int16_t*>(buf)[i]);
      }

      final = vosk_recognizer_accept_waveform_f(recognizer, floatBuf, nread/sizeof(float));
      std::cout << "final is " << final << std::endl;
      if (final) 
      {
          std::cout << coutCH(vosk_recognizer_result(recognizer)) << std::endl;;
      }
      else 
      {
          std::cout << coutCH(vosk_recognizer_partial_result(recognizer)) << std::endl;
      }
  }
  
  final = vosk_recognizer_accept_waveform(recognizer, buf, nread);
  vosk_recognizer_free(recognizer);
  vosk_model_free(model);
  return 0;

} `

liuweie avatar Jul 25 '23 11:07 liuweie

 final = vosk_recognizer_accept_waveform_f(recognizer, floatBuf, nread/sizeof(float));

nread/sizeof(float) should be wrong. You still have nread/2 samples even if you convert to float

nshmyrev avatar Aug 01 '23 14:08 nshmyrev

I saw this also in the C# bindings when calling VoskRecognizer.AcceptWaveform(floatBuffer, numSamplesPerChannel) when floatBuffer contains float32 samples ranging from [-1.0, 1.0]. Is the function expecting the samples to be scaled differently (e.g. [-32767.0, 32767.0] ?

lostromb avatar Sep 27 '23 02:09 lostromb

Is the function expecting the samples to be scaled differently (e.g. [-32767.0, 32767.0] ?

Yes, they have to be in 32768 range.

nshmyrev avatar Sep 27 '23 04:09 nshmyrev