dify Speech to text feature documentation request

Self Checks

[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] Pleas do not modify this template :) and fill in all the required fields.

1. Is this request related to a challenge you're experiencing?

Hi team, I am trying to enable speech to text functionality on the Dify chat interface but I can't find any pointers as to how to do this.

I only see a mention of speech to text here https://docs.dify.ai/user-guide/using-dify-apps/conversation-application#speech-to-text

2. Describe the feature you'd like to see

I would like to see a tutorial documentation page describing how to enable the speech to text functionality.

3. How will this feature improve your workflow or experience?

This will enable building products that have a voice interface

4. Additional context or comments

I think this will be of high value and I am also willing to pay for this feature if needed.

5. Can you help us with this feature?

[X] I am interested in contributing to this feature.

Apr 04 '24 00:04 teodorpopescuQB

Hey @teodorpopescuQB! Great to see you diving into another feature of Dify. Always a pleasure to cross paths with you here.

I understand your need for more detailed documentation on enabling and using the speech to text functionality within the Dify chat interface. Unfortunately, the current documentation and repository do not provide a step-by-step guide on how to do this. However, I can provide some general guidance based on the information available.

To enable and use the speech to text functionality within the Dify chat interface, you need to ensure that the feature is enabled during the application programming phase. Once enabled, you should see a voice input icon in the input box on the web application side. Clicking this icon should convert voice input into text ^1^.

There was a similar issue raised in the past where users were unable to use the speech to text feature in the chat interface. The discussion suggested checking if the settings in prompt editing are turned on and verifying if the correct API key is being used. The feature uses OpenAI's Whisper-1, and the user mentioned they were using an Azure OpenAI key [^2^].

Unfortunately, the specific steps to enable the speech to text functionality during the application programming phase, the specific settings in prompt editing that need to be turned on, and how the Dify chat interface interacts with OpenAI's Whisper-1 for the speech to text feature are not clearly documented in the repository.

As for prerequisites, the device environment must be authorized to use the microphone ^1^.

Your willingness to contribute to the development or documentation of this feature is greatly appreciated. You can start by creating a new issue detailing the steps you took to enable the feature and any challenges you encountered. This will help others who might be facing the same issue and contribute to improving the Dify documentation.

[^2^]: Can't use stt in Chat,while use source code deployment.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Apr 04 '24 00:04 dosubot[bot]

@crazywoolala as discussed in Discord, we should not hide the speech features if no language model (provider) has been configured. Instead we should display it with the switch disabled and add an error message that no language model is set up and link to the model settings.

Apr 28 '24 13:04 perzeuss