trie
trie copied to clipboard
Update to latest libdatrie, move libdatrie to submodule, Unicode support
Hi @tyler ,
I made some updates to this gem that hopefully you could consider incorporating.
- The libdatrie source files baked into the fast_trie gem are quite old, version 0.1.99. This PR updates to the latest libdatrie at https://github.com/tlwg/libdatrie, version 0.2.13.
- libdatrie is licensed under LGPL-2.1, but this gem is licensed under MIT. To clear up any licensing confusion, added LGPL-2.1 to the gemspec and removed all libdatrie source files. They are now pulled in via git submodule. Renamed the
trie.c
with the Ruby extension code totrie_ext.c
- libdatrie now requires an
AlphaMap
specifying the alphabet for the trie. Created a newAlphaMap
class in Ruby to represent this, and anAlphaMap
can be passed in toTrie.new
. If not specified, a defaultAlphaMap
is used with range 0x00-0xff. - libdatrie now supports Unicode. Added support for Unicode. Strings are converted to UTF32 to map to libdatrie's
AlphaChar
type, and converted to utf8 on the way back.
Changes to Trie Ruby bindings:
- Updated to use the
TypedData_Wrap_Struct
macros instead ofData_Wrap_Struct
, etc. - Support for
Marshal.load
,Marshal.dump
- There is now an iterator interface to libdatrie which probably didn't exist when this gem was written. Rewrote the
walk
andchildren*
methods to use thetrie_iterator
interface. This got rid of achar prefix[1024]
fixed-size buffer that was a potential buffer overflow. - The gem supported passing any Ruby
VALUE
as the weight for a key. This has become problematic becauseTrieData
in libdatrie is declared asuint32_t
, but on modern 64-bit systems,VALUE
is 64 bits because it's the size of a pointer. I fixed this by removing the ability to specify any type for the weight but fixnum. Having anyVALUE
in there was also problematic for reading/writing to a file, since it would write memory pointers. - Added
add?
method to matchtrie_store_if_absent
- Added
concat
method like https://github.com/gonzedge/rambling-trie to add a whole array at once, with added support for weights - Updated specs with Unicode test cases and tests for the new methods.
Thanks for writing this gem way back when!