patricia_tree icon indicating copy to clipboard operation
patricia_tree copied to clipboard

is it possible to make keys unicode safe?

Open seppo0010 opened this issue 2 years ago • 3 comments

Hey there. Thanks for this library. I'm trying to create a JSON from the tree but it fails with unicode characters as they might get splitted on the tree. It would be nice to have unicode support and not breaking down multibytes chars. Thoughts?

seppo0010 avatar Nov 09 '21 12:11 seppo0010

Hi, this is an interesting question. Since the memory layout of patricia_tree assumes the key type of a node to be u8 to save memory footprint, it's difficult to directly support Unicode characters as the keys. However, it might be possible by restricting new sibling nodes (tree branches) to be inserted to points where the UTF-8 character boundary is preserved (we need to adjust the return value of the skip_common_prefix method here I think). I'd like to consider this in more detail when I have time (maybe this or next weekend).

sile avatar Nov 09 '21 23:11 sile

Sounds good. I was actually thinking that instead of Vec<u8> the type of the keys could be Vec<K> being K Clone + PartialEq + PartialCmp, and maybe something else. That way it could be another numeric type, a char, or anything else. I tried to do it myself but I couldn't.

seppo0010 avatar Nov 09 '21 23:11 seppo0010

[FYI] I started to implement this feature in #15 (this PR is still very in the early stage though)

sile avatar Nov 15 '21 10:11 sile

Resolved by #15

sile avatar Jan 06 '23 01:01 sile