[Bug]:Information in Chinese cannot be searched in English
Is there an existing issue for the same bug?
- [x] I have checked the existing issues.
RAGFlow workspace code commit ID
d6836444c9cd87d79114298b40e6e4fb597cc4de
RAGFlow image version
v0.17.0
Other environment information
Actual behavior
Information in Chinese cannot be searched in English
Expected behavior
No response
Steps to reproduce
Vectorization model: cohere.embed-multilingual-v3, or any other model that supports multiple languages
Additional information
No response
hello, pdf language is Chinese, query language is English, use a multi-lang model as embedding model, but can't search any chunks. is it your problem?
By looking at the source code, we found that the main reason is to do vectorized query at the same time using the user's input for filter filtering, guessing that is to filter out the content of a higher degree of match, but it will lead to not support the document language other than the language of the query, the core query structure is as follows: { query:{filter...} knn:{filter...} } I don't think the RAG system should sacrifice multi-language features to improve accuracy, but it could be a setting option for users to set up
We will support that soon as long as the Multi-round optimization is enabled.
@KevinHuSh - Can you please elaborate more? Are you planning to create pull request with this feature, or it is already created, and I need just to pull latest nightly build? As I initially mention in #4503 - my use case that I might have documents in multiple language about the same topic/product, and I want to be able a) put them in one knowledge base and b) ask question in one language, effectively getting response from all documents.
As an alternative, I would suggest to create feature that allows to add 2 knowledge bases in chat, that have been created with different embeddings model(i.e, leader on MTEB for English and Chinese )
It's not been full filled yet.
Hi @KevinHuSh , can you please let me know what is your plans on working on multi-round optimization?
if me or @danbus will work on this particular issue- will you accept MR ?
@danbus - can you please direct me to file where sorting you mentioned is happening?
When will this bug be fixed?
@yongtenglei https://github.com/yongtenglei @ *KevinHuSh *Could you please provide an update? As of 17.2 there is still no multilingual search
On Wed, 16 Apr 2025 at 10:18, THEBEST-cloud @.***> wrote:
When will this bug be fixed?
— Reply to this email directly, view it on GitHub https://github.com/infiniflow/ragflow/issues/5710#issuecomment-2808626462, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE6K7KF367J6V24MRU5KSRD2ZX72TAVCNFSM6AAAAABYOAXHGWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMBYGYZDMNBWGI . You are receiving this because you are subscribed to this thread.Message ID: @.***> THEBEST-cloud left a comment (infiniflow/ragflow#5710) https://github.com/infiniflow/ragflow/issues/5710#issuecomment-2808626462
When will this bug be fixed?
— Reply to this email directly, view it on GitHub https://github.com/infiniflow/ragflow/issues/5710#issuecomment-2808626462, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE6K7KF367J6V24MRU5KSRD2ZX72TAVCNFSM6AAAAABYOAXHGWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMBYGYZDMNBWGI . You are receiving this because you are subscribed to this thread.Message ID: @.***>