amoro icon indicating copy to clipboard operation
amoro copied to clipboard

[Improvement]: More detail for table metrics

Open zhongqishang opened this issue 2 years ago • 3 comments

Search before asking

  • [X] I have searched in the issues and found no similar issues.

What would you like to be improved?

image

image

Current Base table metric only have File count/ Total size / Average File Size Statistics are mixed data file and delete file.

How should we improve?

  • Add Average Data File Size
  • Data file / eq delete file / pos delete file count separately
  • Add eq delete count ratio for data count
  • Add pos delete count ratio for data count

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Subtasks

No response

Code of Conduct

zhongqishang avatar Dec 04 '23 09:12 zhongqishang

There are many details to consider.

  1. separate parameter designs may be necessary for different Table Formats.
  2. when users are using the Iceberg Table in an append scenario, there are no delete files, so it's better not to display delete-related metrics at that time.
  3. Change Store and Base Store of Mixed Format also need separate designs

BTW, we can access these detailed metrics information on the snapshots page.

image

wangtaohz avatar Dec 07 '23 05:12 wangtaohz

Add eq delete count ratio for data count Add pos delete count ratio for data count

@zhongqishang I'm a little curious about the real requirements for displaying these ratios :)

wangtaohz avatar Dec 07 '23 05:12 wangtaohz

There are many details to consider.

  1. separate parameter designs may be necessary for different Table Formats.

Each format requires a different design, or even none display. The original idea came from iceberg native format.

  1. when users are using the Iceberg Table in an append scenario, there are no delete files, so it's better not to display delete-related metrics at that time.

Displaying 0 without delete is fine.

  1. Change Store and Base Store of Mixed Format also need separate designs BTW, we can access these detailed metrics information on the snapshots page.

Yes, All this information can be found on the page, but it is not intuitive enough.

Add eq delete count ratio for data count Add pos delete count ratio for data count

@zhongqishang I'm a little curious about the real requirements for displaying these ratios :)

The query results need to be merged with delete files. For some abnormal situations, the number of deletes has a very intuitive reflection on the query analysis of the analysis table.

For example, a larger self-optimizing.major.trigger.duplicate-ratio is configured or the compaction of eq delete is not completed in time.

@wangtaohz

zhongqishang avatar Dec 07 '23 06:12 zhongqishang

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Aug 22 '24 00:08 github-actions[bot]

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

github-actions[bot] avatar Sep 06 '24 00:09 github-actions[bot]