esp-sr PinYin output in esp_mn_results

I observed that esp_mn_results_t->string would contain PinYin output if: (1) a command word is detected; (2)mn_state is ESP_MN_STATE_TIMEOUT.

How can I get PinYin output even no command word is detected? Converting speech audio to PinYin is useful for my use case.

Aug 31 '24 04:08 mike-2020

Any update on this? PinYin output could be useful because it can be interpreted/understood by LLM.

Sep 13 '24 00:09 mike-2020

Hi @mike-2020 , Without a fixed vocabulary, the accuracy of the output pinyin will be greatly reduced.

Sep 13 '24 06:09 sun-xiangyu

@sun-xiangyu I found that the vad of our device is easy to be mistakenly triggered in a noisy environment, so I think if the pinyin output can be opened when the command word is not detected, the detection accuracy rate of vad can be improved according to the pinyin output, which will help improve the stability of our current products

Can this function be opened

Nov 29 '24 08:11 Z3ce

We can open the raw output of the model, which is either pinyin or phoneme classification, but it may be very limited in improving VAD. I have been working on improving VAD performance in noisy environments recently. I would like to know what SNR (Signal-to-Noise Ratio) you are referring to for noisy environments.

Nov 29 '24 08:11 sun-xiangyu

Thank you very much. It would be great if we could open it up. We will hand over the automatic speech recognition to the cloud. As long as we can reduce VAD false positives, we can have a good improvement. We don't have professional equipment in hand. SNR parameters may not be accurately measured. We just hope that VAD can have a better effect in different environments. In addition, AEC and NS are not very effective on our equipment.

Nov 29 '24 08:11 Z3ce

PinYin output in esp_mn_results_t->string (AIS-1690)