deeplake icon indicating copy to clipboard operation
deeplake copied to clipboard

[FEATURE] Support Chinese/CJK text search in BM25 indexing

Open shellphy opened this issue 9 months ago • 2 comments

Description

Currently, DeepLake's BM25 text search works well with English but doesn't properly handle Chinese/CJK text. When searching Chinese characters, even exact matches fail to return expected results.

Use Cases

No response

shellphy avatar Apr 12 '25 09:04 shellphy

Hey @shellphy,

Sorry for the late response. We are currently working on supporting other languages in our BM25 search. Will let you know once it's available.

khustup2 avatar May 13 '25 21:05 khustup2

Hey @shellphy, support for unicode characters was added to deeplake==4.2.7. Can you please try and confirm it works? Thanks!

khustup2 avatar Jun 03 '25 11:06 khustup2