ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Bug]:Information in Chinese cannot be searched in English

Open danbus opened this issue 10 months ago • 6 comments

Is there an existing issue for the same bug?

  • [x] I have checked the existing issues.

RAGFlow workspace code commit ID

d6836444c9cd87d79114298b40e6e4fb597cc4de

RAGFlow image version

v0.17.0

Other environment information


Actual behavior

Information in Chinese cannot be searched in English

Expected behavior

No response

Steps to reproduce

Vectorization model: cohere.embed-multilingual-v3, or any other model that supports multiple languages

Additional information

No response

danbus avatar Mar 06 '25 08:03 danbus

hello, pdf language is Chinese, query language is English, use a multi-lang model as embedding model, but can't search any chunks. is it your problem?

zhangruilin2020 avatar Mar 06 '25 09:03 zhangruilin2020

By looking at the source code, we found that the main reason is to do vectorized query at the same time using the user's input for filter filtering, guessing that is to filter out the content of a higher degree of match, but it will lead to not support the document language other than the language of the query, the core query structure is as follows: { query:{filter...} knn:{filter...} } I don't think the RAG system should sacrifice multi-language features to improve accuracy, but it could be a setting option for users to set up

danbus avatar Mar 07 '25 01:03 danbus

We will support that soon as long as the Multi-round optimization is enabled.

KevinHuSh avatar Mar 07 '25 05:03 KevinHuSh

@KevinHuSh - Can you please elaborate more? Are you planning to create pull request with this feature, or it is already created, and I need just to pull latest nightly build? As I initially mention in #4503 - my use case that I might have documents in multiple language about the same topic/product, and I want to be able a) put them in one knowledge base and b) ask question in one language, effectively getting response from all documents.

As an alternative, I would suggest to create feature that allows to add 2 knowledge bases in chat, that have been created with different embeddings model(i.e, leader on MTEB for English and Chinese )

senovr avatar Mar 09 '25 09:03 senovr

It's not been full filled yet.

KevinHuSh avatar Mar 10 '25 05:03 KevinHuSh

Hi @KevinHuSh , can you please let me know what is your plans on working on multi-round optimization? if me or @danbus will work on this particular issue- will you accept MR ?
@danbus - can you please direct me to file where sorting you mentioned is happening?

senovr avatar Apr 03 '25 04:04 senovr

When will this bug be fixed?

THEBEST-cloud avatar Apr 16 '25 07:04 THEBEST-cloud

@yongtenglei https://github.com/yongtenglei @ *KevinHuSh *Could you please provide an update? As of 17.2 there is still no multilingual search

On Wed, 16 Apr 2025 at 10:18, THEBEST-cloud @.***> wrote:

When will this bug be fixed?

— Reply to this email directly, view it on GitHub https://github.com/infiniflow/ragflow/issues/5710#issuecomment-2808626462, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE6K7KF367J6V24MRU5KSRD2ZX72TAVCNFSM6AAAAABYOAXHGWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMBYGYZDMNBWGI . You are receiving this because you are subscribed to this thread.Message ID: @.***> THEBEST-cloud left a comment (infiniflow/ragflow#5710) https://github.com/infiniflow/ragflow/issues/5710#issuecomment-2808626462

When will this bug be fixed?

— Reply to this email directly, view it on GitHub https://github.com/infiniflow/ragflow/issues/5710#issuecomment-2808626462, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE6K7KF367J6V24MRU5KSRD2ZX72TAVCNFSM6AAAAABYOAXHGWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMBYGYZDMNBWGI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

senovr avatar Apr 23 '25 19:04 senovr