paimon icon indicating copy to clipboard operation
paimon copied to clipboard

[Feature] Introduce secondary index for paimon

Open leaves12138 opened this issue 1 year ago • 2 comments

Search before asking

  • [X] I searched in the issues and found nothing similar.

Motivation

Up to now, Paimon use zorder & order sort compaction to speed up query. After sort compaction, files will be sorted by the order of specified columns. But in some situations, for example, we have tens of columns that should be added in the filter column, sometimes all of them come up together, sometimes, just a few of them. Zorder or order compaction can't handle this situation, because too many columns will reduce the effect of sorting. So if the column base number of these columns is small, we can use bloomfilter or other indexes to speed up queries. That's why this PIP comes up. I want to introduce a index framework to support paimon with flexible index system. 

Solution

No response

Anything else?

No response

Are you willing to submit a PR?

  • [X] I'm willing to submit a PR!

leaves12138 avatar Feb 29 '24 09:02 leaves12138

The index should at the rowgroup(parquet) or stripe(orc) level for better or
it can be configured at the file or row group level ?

zyl891229 avatar Mar 11 '24 09:03 zyl891229

Hi @leaves12138 , what's the status of this feature?

FangYongs avatar May 10 '24 02:05 FangYongs