[Bug]: "embedding_model": "bge-m3@OpenAI-API-Compatible" is not support with dataset creation .
Self Checks
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report (Language Policy).
- [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
- [x] Please do not modify this template :) and fill in all the required fields.
RAGFlow workspace code commit ID
6ca1aef
RAGFlow image version
V0.19
Other environment information
Actual behavior
via api for create a dataset with the parmenter: "embedding_model": "bge-m3@OpenAI-API-Compatible", it will return : { "code": 101, "message": "Unsupported model: bge-reranker-v2-m3@GPUStack" }
but it is normal when we use other @factorys., such as "embedding_model": "bge-m3@Jina"
Expected behavior
No response
Steps to reproduce
curl --location --request POST 'http://172.16.180.217/api/v1/datasets' \
--header 'Authorization: Bearer ragflow-IzMjZmMDJlNDVlZDExZjBhNGEyMDI0Mm' \
--header 'Content-Type: application/json' \
--data-raw '{
"name": "test_KB_Doc1111",
"embedding_model": "bge-m3@Jina",
"chunk_method": "naive",
"parser_config": {
"chunk_token_num": 128,
"delimiter": "\\n",
"layout_recognize": "DeepDOC",
"html4excel": false,
"raptor": {"use_raptor": false},
"graphrag": {"use_graphrag": false}
},
"permission": "team",
"pagerank": 0
}'
{
"code": 101,
"message": "Unsupported model: <bge-reranker-v2-m3@GPUStack>"
}
Additional information
No response
Hi @tangwenhuatommy! I'm Dosu and I’m helping the ragflow team.
The model identifier "bge-m3@OpenAI-API-Compatible" isn't supported for dataset creation in your RAGFlow setup because it's not registered as a valid embedding model in the configuration files or backend service. RAGFlow does not natively expose BGE models like bge-m3 or bge-reranker-v2-m3 as services, which is why you see the "Unsupported model" error. This is a known limitation—only certain models and providers are supported out of the box, and the configuration file does not list "bge-m3@OpenAI-API-Compatible" as a valid option (source, config reference).
If you want to use a custom embedding model, you need to add it to conf/llm_factories.json with the correct format and ensure the backend can recognize and interact with it. The model's API must match RAGFlow's expectations, or you'll need to write an adapter service (setup details, backend logic).
As a workaround, you can use alternatives like Ollama or XInference to serve the embedding models, or deploy a "slim" version of the RAGFlow docker image that doesn't include built-in embedding and rerank models (details).
If "bge-m3@Jina" works, it's because that provider is registered and supported in your configuration. For BGE models, using a supported provider like "Jina" or "VLLM" (e.g., "bge-m3@VLLM") is the recommended approach (example).
To reply, just mention @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
We sincerely apologize that this issue was introduced in #7451 and has now been fixed in #8069.
Currently, only version v0.19.0 is affected. You have two options to resolve this:
- Manually modify the corresponding code in docker and restart your Docker container
- Use the nightly Docker image instead
@asiroliu Is it recommended to use the nightly Docker image for production environments?
For production environments, we recommend directly modifying the code instead of using the nightly build, as it hasn't undergone comprehensive testing.
- modify code
$ docker exec -it ragflow-server bash
$ vi api/utils/api_utils.py
- refer to #8069:
- exit the Docker environment
- restart docker
ragflow-server
$ docker restart ragflow-server
I tried to modify the code as above, but specifying the custom model still gave an error
Please verify:
- Whether the container has been restarted after code modifications
- Whether this llm has been added via the '/user-setting/model' page
1.The container has been restarted after the code was modified
2.The model has been added
Excuse me, one more question. After setting the default embedding model, the default model will be used if you don't specify embedding. Using the front-end page to create the database looks like this, but using Python's API why the default model is BAAI/bge-large-zh-v1.5@BAAI
@Jialiang123 The current implementation seems somewhat unusual. Would you mind opening an issue to track this and assign it to me? I'll fix this later - when no embedding model is specified, we should use the tenant's default embedding model.