[Feature]: Monitor the health status of tables
Description
Monitor the health status of tables and display it to the users.
Use case/motivation
In the current Amoro, some metric information about tables is displayed, but it is difficult to determine the health of tables through this metric information and further alert users to pay attention to unhealthy tables.
Amoro should be able to determine the health of a table and alert users to problematic tables. the determination of whether a table is healthy may be composed of two factors:
- File status: the situation of data files and metadata files in the table, overly fragmented files and too many delete files indicate poor query performance of the table.
- Data quality: the data in the table should meet some user-defined rules.
Describe the solution
- Support health indicators configuration on tables
- Support tracking the health status of tables and prompt users to pay attention to unhealthy tables
Subtasks
- [x] #3180
Related issues
No response
Are you willing to submit a PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Hi @zhoujinsong , we can collect over 25 metrics. These metrics are categorized into several groups:
- Snapshot metrics: Include total and changes in data files, delete files, records added or removed, and size changes.
- Partition and file metrics: Aggregated and per-partition metrics like average, maximum, minimum record counts and file sizes, which help in understanding data distribution and help optimizing storage. Ref: https://aws.amazon.com/vi/blogs/big-data/monitoring-apache-iceberg-metadata-layer-using-aws-lambda-aws-glue-and-aws-cloudwatch/
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'