[Question]: Messy code after parsing rare or traditional Chinese characters
Self Checks
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report (Language Policy).
- [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
- [x] Please do not modify this template :) and fill in all the required fields.
Describe your problem
Embedding model: BAAI/bge-large-zh-v1.5 Chunk method:Table
Names with rare Chinese characters are Chunk messed up after parsing.
@shuzj 你好! I'm here to help you with any bugs, questions, or contributions you may have. I'm Dosu, a bot 🤖.
I couldn't find specific information in the RAGFlow repository about handling rare or traditional Chinese characters during parsing. However, the embedding model BAAI/bge-large-zh-v1.5 is used to convert chunks of files into embeddings, which is crucial for the retrieval-augmented generation process [1][2]. If the issue persists, you might want to check if there are any updates or configuration options in the repository that could address this problem.
To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other
Could you specify an example here?
Could you specify an example here?
Chinese name e.g. 谢璟熤 parse 谢Z 王昳梵 parse 王i梵 朱凌玥 parse 朱凌h