ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Feature Request]: Can ragflow add a speech-to-text model such as faster whisper? Can only fish audio be added to the text-to-speech model

Open ZxnSnowy opened this issue 1 year ago • 5 comments

Is there an existing issue for the same feature request?

  • [X] I have checked the existing issues.

Is your feature request related to a problem?

No response

Describe the feature you'd like

I wish I could add speech-to-text models like whisper. There are also text-to-speech models, such as cosyvoice

Describe implementation you've considered

No response

Documentation, adoption, use case

No response

Additional information

No response

ZxnSnowy avatar Oct 17 '24 07:10 ZxnSnowy

Do you mean this? image

KevinHuSh avatar Oct 18 '24 01:10 KevinHuSh

你是这个意思吗? 图像

Yes, how did you add it? Can I add a local model?

ZxnSnowy avatar Oct 21 '24 01:10 ZxnSnowy

Add a validated openAI api key.

KevinHuSh avatar Oct 21 '24 04:10 KevinHuSh

Add a validated openAI api key.

Thank you for your reply. I was wondering if I could add the local model I downloaded? Examples include faster-whisper and GPT-SoVITS.

ZxnSnowy avatar Oct 21 '24 05:10 ZxnSnowy

What about using Xinference to deploy your LLM model?

KevinHuSh avatar Oct 22 '24 03:10 KevinHuSh

What about using Xinference to deploy your LLM model?

@KevinHuSh Good idea. I have already used Xinference to deploy CosyVoice2-0.5B. My question is, how can we use this model? Do we need to develop the audio workflow and get the text data back from the Xinderence service?

zzc-ccccc avatar May 06 '25 11:05 zzc-ccccc