ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Question]: Messy code after parsing rare or traditional Chinese characters

Open shuzj opened this issue 11 months ago • 3 comments

Self Checks

  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit this report (Language Policy).
  • [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
  • [x] Please do not modify this template :) and fill in all the required fields.

Describe your problem

Embedding model: BAAI/bge-large-zh-v1.5 Chunk method:Table

Names with rare Chinese characters are Chunk messed up after parsing.

shuzj avatar Mar 30 '25 14:03 shuzj

@shuzj 你好! I'm here to help you with any bugs, questions, or contributions you may have. I'm Dosu, a bot 🤖.

I couldn't find specific information in the RAGFlow repository about handling rare or traditional Chinese characters during parsing. However, the embedding model BAAI/bge-large-zh-v1.5 is used to convert chunks of files into embeddings, which is crucial for the retrieval-augmented generation process [1][2]. If the issue persists, you might want to check if there are any updates or configuration options in the repository that could address this problem.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

dosubot[bot] avatar Mar 30 '25 14:03 dosubot[bot]

Could you specify an example here?

KevinHuSh avatar Mar 31 '25 05:03 KevinHuSh

Could you specify an example here?

Chinese name e.g. 谢璟熤 parse 谢Z 王昳梵 parse 王i梵 朱凌玥 parse 朱凌h

shuzj avatar Apr 01 '25 10:04 shuzj