PruningRadixTrie
PruningRadixTrie copied to clipboard
[BUG] UpdateMaxCounts can contaminate the sorted state
The UpdateMaxCounts
function updates the termFrequencyCountChildMax
field, which is also the field by which the Node.Children
Lists are sorted. Since the sort
call comes before the UpdateMaxCounts
call in the following code, each layer is not actually guaranteed to be in sorted order.
https://github.com/wolfgarbe/PruningRadixTrie/blob/40b01f3402a8274ce7b0f0eb71c68278590f79b7/PruningRadixTrie/PruningRadixTrie.cs#L126-L136
And there is no sort
call after this UpdateMaxCounts
:
https://github.com/wolfgarbe/PruningRadixTrie/blob/40b01f3402a8274ce7b0f0eb71c68278590f79b7/PruningRadixTrie/PruningRadixTrie.cs#L56-L61
In practice, this can subtly worsen performance.
Also, because UpdateMaxCounts
updates termFrequencyCountChildMax
in potentially all ancestor nodes, those layers would need to be sorted as well if every layer is supposed to maintain sorted order.
Thanks. I will look into it.