What if the inverted index record only a mark number of the rows from the same granule
In currently implementation of inverted index, it saves row IDs where a term appears in the posting list. At query time, row ID range of each index granule is matched against the posting list to check if the granule contains any of row IDs of the term. For a matching granule, it returns true from mayBeTrueOnGranuleInPart.
If the posting list records only a granule ID (i.e. the mark number) for rows in the granule, the cardinality can be greatly reduced.
There is no problem with your idea. We have modified and tested it based on this idea before. On our machine, the performance of the inverted index is equivalent to that of the primary key query.
When the inverted index is imported into the library, granule.mark_number is written instead of rowid. When querying, filter by MarkRange.
Please check #62706 , if the presented improvement is proper.
Mapping terms to granule IDs will make the inverted index behave like a bloom filter with zero false-positive rate. So I turned to using a 'divisor'.