dify Cannot add openai compatible stt model.

Self Checks

[X] This is only for bug report, if you would like to ask a question, please head to Discussions.
[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
[X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.8.3

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Just add OpenAI-API-compatible Speech2text model.

I'm using faster-whisper-server for local OA-compatible API server. This server is running on another docker container in same computer.

It looks like every API request was succeeded. However, Speech2text model does not appear in model list.

It may be caused by my local whisper server. But since Dify app tells me "Operation successful", it should be Dify's bug.

✔️ Expected Behavior

STT model appear in model list. If operation failed, I get notification.

❌ Actual Behavior

All process looks like succeeded. STT model does not appear.

Sep 26 '24 13:09 tetyoka

Hey @tetyoka! I'm here to help you with any bugs, questions, or contributions you have for Dify. Let's get this issue sorted out together!

There are a few similar issues and discussions related to adding OpenAI-compatible speech-to-text (STT) models in Dify:

A common cause for the issue where the OpenAI-compatible Speech2text model does not appear in the model list, despite successful API requests and notifications, is the failure of credentials validation. The credentials validation might be failing, which prevents the model from being listed.

Ensure that the openai_api_key provided in the credentials is correct and valid. If the credentials are invalid, the model will not be listed. Additionally, make sure that the local faster-whisper-server is properly configured and accessible.

Here is a relevant snippet from the test script that includes a validation step:

def test_validate_credentials():
    model = OpenAISpeech2TextModel()

    with pytest.raises(CredentialsValidateFailedError):
        model.validate_credentials(model="whisper-1", credentials={"openai_api_key": "invalid_key"})

    model.validate_credentials(model="whisper-1", credentials={"openai_api_key": os.environ.get("OPENAI_API_KEY")})

The OAICompatSpeech2TextModel class in the Dify codebase shows that the _invoke method is responsible for making the API call to the speech-to-text service. This method constructs the request using the provided api_key and endpoint_url from the credentials. If the endpoint_url is not correctly configured or if the API key is invalid, the request will fail, and the model will not be listed.

def _invoke(self, model: str, credentials: dict, file: IO[bytes], user: Optional[str] = None) -> str:
    headers = {}

    api_key = credentials.get("api_key")
    if api_key:
        headers["Authorization"] = f"Bearer {api_key}"

    endpoint_url = credentials.get("endpoint_url")
    if not endpoint_url.endswith("/"):
        endpoint_url += "/"
    endpoint_url = urljoin(endpoint_url, "audio/transcriptions")

    payload = {"model": model}
    files = [("file", file)]
    response = requests.post(endpoint_url, headers=headers, data=payload, files=files)

    if response.status_code != 200:
        raise InvokeBadRequestError(response.text)
    response_data = response.json()
    return response_data["text"]

Make sure that the endpoint_url and api_key in the credentials are correctly set and that the local faster-whisper-server is running and accessible at the specified endpoint_url ^[3]^[4]^[5].

To continue talking to Dosu, mention @dosu.

Sep 26 '24 13:09 dosubot[bot]

I think you need to refresh the page to see the newly added model.

Sep 26 '24 14:09 crazywoola

I did. The model does not appear.

Sep 26 '24 14:09 tetyoka

QQ_1727361096875

Here？

Sep 26 '24 14:09 crazywoola

It says 0 model. And also.

Sep 26 '24 14:09 tetyoka

Despite my STT situation, I can add OA-compatible LLM API(oobagooba). LLM API is running on another computer.

Sep 26 '24 14:09 tetyoka

I updated Dify to 0.9.1. It still happen.

Oct 02 '24 12:10 tetyoka

that's because current openai-api-compatible/openai code of speech2text has not implement get_customizable_model_schema method. I have copied the code from localai to fix it.

Oct 03 '24 10:10 zhao85

I will try your patch!

Oct 04 '24 10:10 tetyoka

Thanks for the awesome project! I'm not sure if this is related, but I keep getting an error from OpenAI models hosted in a different base_url. When I run the same "workflow" with Ollama Llama3.2 it works but when I swap it out for an OpenAi version I get this:

using OpenAi with a different host api [openai] Error: 1 validation error for LLMResultChunk model Input should be a valid string [type=string_type, input_value=None, input_type=NoneType] For further information visit https://errors.pydantic.dev/2.9/v/string_type

Using Ollama with a different host api { "text": "", ..., "finish_reason": "Non-JSON encountered.", ...}

Jan 27 '25 19:01 andresvidal

Thanks for the awesome project! I'm not sure if this is related, but I keep getting an error from OpenAI models hosted in a different base_url. When I run the same "workflow" with Ollama Llama3.2 it works but when I swap it out for an OpenAi version I get this:

using OpenAi with a different host api [openai] Error: 1 validation error for LLMResultChunk model Input should be a valid string [type=string_type, input_value=None, input_type=NoneType] For further information visit https://errors.pydantic.dev/2.9/v/string_type

Using Ollama with a different host api { "text": "", ..., "finish_reason": "Non-JSON encountered.", ...}

I have the same issue

Mar 04 '25 03:03 sean-escaped