esp-sr icon indicating copy to clipboard operation
esp-sr copied to clipboard

PinYin output in esp_mn_results_t->string (AIS-1690)

Open mike-2020 opened this issue 1 year ago • 5 comments

I observed that esp_mn_results_t->string would contain PinYin output if: (1) a command word is detected; (2)mn_state is ESP_MN_STATE_TIMEOUT.

How can I get PinYin output even no command word is detected? Converting speech audio to PinYin is useful for my use case.

mike-2020 avatar Aug 31 '24 04:08 mike-2020

Any update on this? PinYin output could be useful because it can be interpreted/understood by LLM.

mike-2020 avatar Sep 13 '24 00:09 mike-2020

Hi @mike-2020 , Without a fixed vocabulary, the accuracy of the output pinyin will be greatly reduced.

sun-xiangyu avatar Sep 13 '24 06:09 sun-xiangyu

@sun-xiangyu I found that the vad of our device is easy to be mistakenly triggered in a noisy environment, so I think if the pinyin output can be opened when the command word is not detected, the detection accuracy rate of vad can be improved according to the pinyin output, which will help improve the stability of our current products

Can this function be opened

Z3ce avatar Nov 29 '24 08:11 Z3ce

We can open the raw output of the model, which is either pinyin or phoneme classification, but it may be very limited in improving VAD. I have been working on improving VAD performance in noisy environments recently. I would like to know what SNR (Signal-to-Noise Ratio) you are referring to for noisy environments.

sun-xiangyu avatar Nov 29 '24 08:11 sun-xiangyu

Thank you very much. It would be great if we could open it up. We will hand over the automatic speech recognition to the cloud. As long as we can reduce VAD false positives, we can have a good improvement. We don't have professional equipment in hand. SNR parameters may not be accurately measured. We just hope that VAD can have a better effect in different environments. In addition, AEC and NS are not very effective on our equipment.

Z3ce avatar Nov 29 '24 08:11 Z3ce