rust_radix_trie
rust_radix_trie copied to clipboard
Let any vector be the key of the trie.
The idea:
The most natural thing for the prefix implementation is Vec<T>
. The Trie is currently implemented through serialization of the key value into Vec<u8>
.
Could be great if Vec<T>
was usable as 'plain' key, without making it into Vec<u8>
, probably with some limitations to T. Such limitations could probably be some kind of TrieKeyItem
trait, meaning that a type can be a single item of a key making TrieKey
into TrieKey<Item=Something>
.
This brings alot of convenience and univerality to the datastructure. Strings, for example could be stored as bytes or unicode symbols depending of user needs.
The Vec<u8>
in this case could be made into some external thing and only be used for types that could not be easily converted into corresponding vector implementation.
It seems to me that first step could be to "just" convert u8 to some internal generic type, probably not seen to user, but being replaced to Item type when the Vec(Iterable?, Indexable?) has been provided as a key.
The fallback:
In the case the idea below is impossible(in the context of this library of course), we could probably try implementing TrieKey
for Vec<T: TrieKey>
at least. That will probably require some unneeded computation when converting from/into such vectors but will still be very useful.
This is a good idea. We just need to think of a way to do it simply and efficiently. One option would be to encode each portion of the key as bytes and store something like a Vec<Vec<u8>>
on each "edge" of the trie (the TrieNode.key
field at the moment). We would have to deal with keys getting cut in half under this scheme, as its quite natural to want a trie keyed by something like Vec<String>
. For example, for URL routing, you could encode a path like "/download/file1.txt"
as vec!["download", "file1.txt"]
, which would split a key if "/download/file2.txt"
were already in the trie.
Another way I've thought about implementing this is to define a separator byte that can be used to delineate the different elements of the vector. For filesystem and URL paths, the byte encoding of /
could be used, as it's guaranteed not to appear in paths.
Well, my thought is that is more of a user's concern to decide the split point of his/her data. Wouldn't it be more natural and useful to ask a library user to provide data already split as Vec or, if possible as an Iterable object(it is possible btw?).
In the example of Vec<String> I think we should consider user wanting his strings to be the atomic part of the tree. If user wanted his strings to be split by some byte, or any other criteria, it's his/her decision to give us a Vec
Maybe something like the sequence trie API? https://docs.rs/sequence_trie/0.2.1/sequence_trie/
That depends, if the radix trie itself is compatible with iterable structures. The other question is effectiveness of iteration-only structure and it's compatibility with different optimizations done by llvm, like vectorizations etc.