patricia_tree
patricia_tree copied to clipboard
is it possible to make keys unicode safe?
Hey there. Thanks for this library. I'm trying to create a JSON from the tree but it fails with unicode characters as they might get splitted on the tree. It would be nice to have unicode support and not breaking down multibytes chars. Thoughts?
Hi, this is an interesting question.
Since the memory layout of patricia_tree
assumes the key type of a node to be u8
to save memory footprint, it's difficult to directly support Unicode characters as the keys.
However, it might be possible by restricting new sibling nodes (tree branches) to be inserted to points where the UTF-8 character boundary is preserved (we need to adjust the return value of the skip_common_prefix
method here I think).
I'd like to consider this in more detail when I have time (maybe this or next weekend).
Sounds good. I was actually thinking that instead of Vec<u8>
the type of the keys could be Vec<K>
being K
Clone + PartialEq + PartialCmp
, and maybe something else. That way it could be another numeric type, a char, or anything else. I tried to do it myself but I couldn't.
[FYI] I started to implement this feature in #15 (this PR is still very in the early stage though)
Resolved by #15