Suggestion on latency and multilingual issues

Open JunseokLee98 opened this issue 1 year ago • 0 comments

Hello, I wrote a suggestion after receiving the review request from you.

As mentioned on README, one of main features in aura-voice is low latency in responses. However, your statement is not supported by any numerical experiments compared to existing applications(e.g., gpt-4o-mini). Therefore, if you want to emphasize latency strength, I recommend you quantitative analysis between your application and existing ones. It could lead to plausible statement for users.

The other feedback is related to multilingual issue as mentioned on closed issue #14 . It seems that aura-voice still has that issue and you would try to solve it via modification on system prompt. I recommend you to explicitly limit supported languages on README. Also, you can change your Whisper model because it still has the same problem consistent with aura-voice. I don't know exact reason why those models cause 'mismatch-language' issue.(i.e., I don't know it depends on gpt-4o or Whisper)

Additionally, Whisper and GPT-4o-mini are supported on multilingual languages.

I hope my feedback contributes to your on-going project.

Sep 10 '24 04:09 JunseokLee98