paimon icon indicating copy to clipboard operation
paimon copied to clipboard

[core] Introduce level0FileCount for partitions table

Open MonsterChenzhuo opened this issue 1 year ago • 4 comments

Purpose

Linked issue: close #xxx

Tests

PartitionsTableTest#testLevel0FileCountValue

API and Format

Documentation

MonsterChenzhuo avatar Aug 27 '24 10:08 MonsterChenzhuo

Thanks @MonsterChenzhuo for the contribution.

But what is usage of level0file?

JingsongLi avatar Aug 29 '24 01:08 JingsongLi

Thanks @MonsterChenzhuo for the contribution.

But what is usage of level0file?

In the scenario where a table has the DelVector enabled, users can quickly determine whether data has been written and whether compaction has been completed by checking the level0file, especially when there is no data found for the current partition

However, if you use $files, the results may not be intuitive. This often requires users to perform aggregation to interpret the results.

MonsterChenzhuo avatar Aug 29 '24 12:08 MonsterChenzhuo

Thanks @MonsterChenzhuo for the contribution. But what is usage of level0file?

In the scenario where a table has the DelVector enabled, users can quickly determine whether data has been written and whether compaction has been completed by checking the level0file, especially when there is no data found for the current partition

However, if you use $files, the results may not be intuitive. This often requires users to perform aggregation to interpret the results.

But this is depends on per bucket? We should know the maxLevel0FilesInBucket and avgLevel0FilesInBucket, maybe it is better to just let them in metrics.

JingsongLi avatar Aug 30 '24 01:08 JingsongLi

Thanks @MonsterChenzhuo for the contribution. But what is usage of level0file?

In the scenario where a table has the DelVector enabled, users can quickly determine whether data has been written and whether compaction has been completed by checking the level0file, especially when there is no data found for the current partition However, if you use $files, the results may not be intuitive. This often requires users to perform aggregation to interpret the results.

But this is depends on per bucket? We should know the maxLevel0FilesInBucket and avgLevel0FilesInBucket, maybe it is better to just let them in metrics.

maxLevel0FilesInBucket and avgLevel0FilesInBucket,

For real-time writes to the Paimon table, we use real-time compaction and collect metrics to monitor maxLevel0FilesInBucket and avgLevel0FilesInBucket. However, for scenarios with infrequent updates (such as T+1) that require high throughput and low consumption, using offline compaction to monitor the number of L0 files through metrics feels less convenient compared to using system tables.

There is an operational path as follows: Check the system table to see if there are any L0 data remaining in the partition: SELECT * FROM default.T$partitions; If there are, use an SQL stored procedure to execute compaction: CALL sys.compaction(table => default.T);

MonsterChenzhuo avatar Sep 02 '24 13:09 MonsterChenzhuo

It seems a specific usage, let's wait future requirements.

JingsongLi avatar Oct 30 '24 06:10 JingsongLi