deeplake
deeplake copied to clipboard
[FEATURE] Support Chinese/CJK text search in BM25 indexing
Description
Currently, DeepLake's BM25 text search works well with English but doesn't properly handle Chinese/CJK text. When searching Chinese characters, even exact matches fail to return expected results.
Use Cases
No response
Hey @shellphy,
Sorry for the late response. We are currently working on supporting other languages in our BM25 search. Will let you know once it's available.
Hey @shellphy, support for unicode characters was added to deeplake==4.2.7. Can you please try and confirm it works? Thanks!