PruningRadixTrie icon indicating copy to clipboard operation
PruningRadixTrie copied to clipboard

[BUG] UpdateMaxCounts can contaminate the sorted state

Open Validark opened this issue 3 years ago • 1 comments

The UpdateMaxCounts function updates the termFrequencyCountChildMax field, which is also the field by which the Node.Children Lists are sorted. Since the sort call comes before the UpdateMaxCounts call in the following code, each layer is not actually guaranteed to be in sorted order.

https://github.com/wolfgarbe/PruningRadixTrie/blob/40b01f3402a8274ce7b0f0eb71c68278590f79b7/PruningRadixTrie/PruningRadixTrie.cs#L126-L136

And there is no sort call after this UpdateMaxCounts: https://github.com/wolfgarbe/PruningRadixTrie/blob/40b01f3402a8274ce7b0f0eb71c68278590f79b7/PruningRadixTrie/PruningRadixTrie.cs#L56-L61

In practice, this can subtly worsen performance.

Also, because UpdateMaxCounts updates termFrequencyCountChildMax in potentially all ancestor nodes, those layers would need to be sorted as well if every layer is supposed to maintain sorted order.

Validark avatar Jun 13 '21 10:06 Validark

Thanks. I will look into it.

wolfgarbe avatar Jun 14 '21 15:06 wolfgarbe