ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Question]: Multilingual support between embedding knowledge base, retrieval testing, search, and assistant chat

Open predoctech opened this issue 11 months ago β€’ 9 comments

Describe your problem

As this project has a Chinese/English focus I tried to experiment with a bilingual test case. So the source document is in Chinese: Screenshot from 2025-01-16 12-31-16 Embedding is done with maidalun1020/bce-embedding-base_v1, which I understood to be a Bilingual and Crosslingual Embedding model. I work under the assumption that it means while the source document is in Chinese, I will be able to perform retrieval testing, search, and chat in English should the semantic meaning of a chunk matches. Obviously the LLM deployed (Gemini) needs to be bilingual as well which is the case. However that is not what I have experienced with. Retrieval testing: Always return with "no data" Search: No result Screenshot from 2025-01-16 12-39-39 Chat: Knowledge base is empty Screenshot from 2025-01-16 12-41-21 Please advise if multilingual support is available in Ragflow, or if what has attempted wasn't the correct approach for such a purpose? Thanks.

predoctech avatar Jan 16 '25 04:01 predoctech

I would second this question. I tried multi-language use case (one knowledge base, documents in two languages, embedder e5-medium that is multi-lingual). When I asked question in English- only English documents are used for reference, when I am asking in second language - it uses only second language documents.

senovr avatar Jan 16 '25 06:01 senovr

Multilingual search is not supported well so far.

KevinHuSh avatar Jan 17 '25 01:01 KevinHuSh

Upon further experiments I found that the limitation is more to do with the RAG process rather than the LLM model. Basically an embedded vector from English questions will not retrieve any embedded vector with Chinese data, thus leaving any subsequent LLM interaction irrelevant. However according to the description of BCEmbedding model: EmbeddingModel handle bilingual and crosslingual retrieval task in English and Chinese So why would this become a hurdle when adopted and utilized within RAGFLOW?

predoctech avatar Jan 17 '25 11:01 predoctech

Just thinking aloud: Did you test the process outside of rag flow? May be issue is in embeddings model , and not on rag flow side ? I will also test some proprietary embedders available via api, will come back if something interesting comes up

senovr avatar Jan 17 '25 11:01 senovr

Multilingual search is not supported well so far.

Hi, what if I integrate specific embedding model for my target langauge? I know via GUI this is possible, also I can deploy that model on the same host and it should work but also I see the parameter "langauge" in the "knowldge base" configuration, does it mean that one of these 3 langauges will be used as an input parameter for embedding model when we process the chunk. If it so, then how I can add the 4th langauge into the list?

Also the model like "all-MiniLM-L6-v2" supports different langauges and maybe if we can specify the original text langauge as an input parameter for embedding process then it could be an easy win for a timebeeinng. This is just a question in case if such workaround is possible, and if yes could you please help to find our where in the code it needs to be changed ...

thank you

faastore avatar Feb 05 '25 19:02 faastore

support

chminsc avatar Feb 11 '25 12:02 chminsc

+1

nvictorm avatar Feb 19 '25 10:02 nvictorm

Encounter the same problem

danbus avatar Mar 06 '25 07:03 danbus

In order to verify this problem, I use the cohere.embed-multilingual-v3 vectorization model to perform a vectorization comparison test using a python script locally. The similarity between Chinese and English is 0.8 and 0.76, but it cannot be retrieved in ragflow using English

danbus avatar Mar 06 '25 07:03 danbus

Thank you so much for your suggestion! πŸ™ We really appreciate your input.

In our latest version, we've added Multilingual support! 🌍✨ Please give it a try and let us know if it helps resolve your issue. We hope this feature provides a better experience for you!

Once again, thank you for your continued support of RAGFlow! We’re always excited to hear your thoughts and ideas, and we look forward to making the product even better with your help. πŸ˜„ @predoctech

BadwomanCraZY avatar May 12 '25 08:05 BadwomanCraZY