pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Explore to use Trie Tree to speed up both json_match and json_extract_index transform function for mutable json indexing segment

Open wirybeaver opened this issue 2 years ago • 0 comments

I notice that the postingListMap is changed from hashMap to TreeMap in this PR: https://github.com/apache/pinot/pull/12568

jsonMatch performs a point search on the _postingListMap wheras jsonExtractIndex perform a prefix search on the _postingListMap. The treeMap can speed the prefix search but slow down the point search.

Ideally, we can use TrieTree to speed up both function. Moreover, we don’t have to store the literal “$index” inside the key of postingListMap. last but not least, we can use latch crabbing to increase the concurrency. the existing read and write lock the whole map, which is a bottleneck of throughput.

A rough idea of the Data structure below:

TrieNode {
  boolean arrayIndex;
  String subPath;
  Map<String, TrieNode> kids;
}

wirybeaver avatar Mar 13 '24 19:03 wirybeaver