trie icon indicating copy to clipboard operation
trie copied to clipboard

Update to latest libdatrie, move libdatrie to submodule, Unicode support

Open ggrossman opened this issue 2 years ago • 0 comments

Hi @tyler ,

I made some updates to this gem that hopefully you could consider incorporating.

  • The libdatrie source files baked into the fast_trie gem are quite old, version 0.1.99. This PR updates to the latest libdatrie at https://github.com/tlwg/libdatrie, version 0.2.13.
  • libdatrie is licensed under LGPL-2.1, but this gem is licensed under MIT. To clear up any licensing confusion, added LGPL-2.1 to the gemspec and removed all libdatrie source files. They are now pulled in via git submodule. Renamed the trie.c with the Ruby extension code to trie_ext.c
  • libdatrie now requires an AlphaMap specifying the alphabet for the trie. Created a new AlphaMap class in Ruby to represent this, and an AlphaMap can be passed in to Trie.new. If not specified, a default AlphaMap is used with range 0x00-0xff.
  • libdatrie now supports Unicode. Added support for Unicode. Strings are converted to UTF32 to map to libdatrie's AlphaChar type, and converted to utf8 on the way back.

Changes to Trie Ruby bindings:

  • Updated to use the TypedData_Wrap_Struct macros instead of Data_Wrap_Struct, etc.
  • Support for Marshal.load, Marshal.dump
  • There is now an iterator interface to libdatrie which probably didn't exist when this gem was written. Rewrote the walk and children* methods to use the trie_iterator interface. This got rid of a char prefix[1024] fixed-size buffer that was a potential buffer overflow.
  • The gem supported passing any Ruby VALUE as the weight for a key. This has become problematic because TrieData in libdatrie is declared as uint32_t, but on modern 64-bit systems, VALUE is 64 bits because it's the size of a pointer. I fixed this by removing the ability to specify any type for the weight but fixnum. Having any VALUE in there was also problematic for reading/writing to a file, since it would write memory pointers.
  • Added add? method to match trie_store_if_absent
  • Added concat method like https://github.com/gonzedge/rambling-trie to add a whole array at once, with added support for weights
  • Updated specs with Unicode test cases and tests for the new methods.

Thanks for writing this gem way back when!

ggrossman avatar Feb 13 '23 01:02 ggrossman