icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

ZeroTrie types simplification

Open sffc opened this issue 2 years ago • 1 comments

Currently we have 4 types:

  • ZeroTrieSimpleAscii
    • Supports ASCII only (so that it can be const constructed)
    • Does not use the perfect hash function (better for small data)
    • Does not support span nodes (because ASCII does not need them)
    • Maximum size is 2^32
  • ZeroTriePerfectHash
    • Supports arbitrary bytes, including UTF-8
    • Uses the perfect hash function for branch nodes with at least 16 children
    • Uses span nodes
    • Maximum size is 2^32 (slight performance win)
  • ZeroTrieExtendedCapacity
    • Same as ZeroTriePerfectHash but there is no size limit
  • ZeroTrie
    • An enum over the three above types

I think we should consolidate these down to two types:

  • ZeroAsciiTrie == ZeroTrieSimpleAscii
  • ZeroBytesTrie == ZeroTrieExtendedCapacity
  • Remove ZeroTriePerfectHash because the ~5% perf benefit isn't worth keeping around a separate type
  • Remove ZeroTrie because the type of trie should be determined by the programmer based on the nature of the data, not via a runtime switch

OK?

  • [x] @Manishearth
  • [x] @robertbastian

sffc avatar Dec 05 '23 17:12 sffc

Worth noting that the byte pattern of a ZeroAsciiTrie can be consumed by a ZeroBytesTrie which means it is forwards-compatible to switch between them.

sffc avatar Dec 05 '23 21:12 sffc