[Feature] add tablet statistics system table to track the data accessing information
Feature request
Is your feature request related to a problem? Please describe. Currently the tablet lacks operational behavior(like read/write) statistics, based on statistics, we could acquire more information about the data accessing information. A typical scenario such as: when user set a long TTL for table partitions, but the historic partitions are never scanned or rarely scanned, it would cause unnecessary storage waste and metadata pressure. So more detailed partition scan information is needed to distinguish cold and hot partitions, it would become a reference to data governance.
Describe the solution you'd like
Add a system table _statistics_.tablet_statistics to collect the tablet statistics.
note: currently just collect tablet read statistics, in the future could collect more statistics like write.
Technical Solution
1.create system table
When the cluster is initially build or upgrade from old version, the system table _statistics_.tablet_statistics would be created in _statistics_ system database.
create table sql:
CREATE TABLE _statistics_.tablet_statistics (
`dt` date NOT NULL COMMENT "",
`catalog_name` varchar(65533) NOT NULL COMMENT "",
`db_name` varchar(65533) NOT NULL COMMENT "",
`table_name` varchar(65533) NOT NULL COMMENT "",
`partition_name` varchar(65533) NOT NULL COMMENT "",
`tablet_id` bigint(20) NULL COMMENT "",
`read_count` double SUM NOT NULL COMMENT ""
) ENGINE=OLAP
AGGREGATE KEY(`dt`, `catalog_name`, `db_name`, `table_name`, `partition_name`, `tablet_id`)
PARTITION BY date_trunc('day', `dt`)
DISTRIBUTED BY HASH(`tablet_id`) BUCKETS 10
PROPERTIES (
"partition_live_number" = "3"
);
the system table design point as follows:
- physical table, not a view, the data is saved on be, load data by stream load.
note: the reason of choose physical table not a view is in typical scenario of data governance usually need trace back to the last 30/90/180/365 days, too many tablet statistics would cause the fe resource waste.
-
aggregate model table, the aggregate key is
dt,catalog_name,db_name,table_name,partition_name,tablet_id, the metric key isread_countand aggregate function isSUM -
partitioned table, partition key is
dt, the ttl is set by parametertablet_statistics_keep_days(default value is 3 days) -
when the table in the query sql is partitioned, the
partition_namein_statistics_.tablet_statisticsdisplay partition name. otherwise display table name.
2.feature enable
the feature is enabled by parameter enable_tablet_statistics(default value is false)
3.feature workflow
-
handle query in handleQueryStmt and transfer execPlan to TabletStatisticsMgr
-
parse scanTabletId from execPlan
-
accumulate data for scanTabletId in batch, load frequency is set by parameter
tablet_statistics_load_interval_seconds(default value is 5 min) and batch size(default value is 500) -
when load is required, get catalog/db/table/partition info from scanTabletId, and build tablet statistics object
-
load tablet statistics into system table by streamLoad
4.system table query result
MySQL [_statistics_]> select * from _statistics_.tablet_statistics;
+------------+-----------------+---------------------------+----------------------------+----------------+-----------+------------+
| dt | catalog_name | db_name | table_name | partition_name | tablet_id | read_count |
+------------+-----------------+---------------------------+----------------------------+----------------+-----------+------------+
| 2025-06-11 | default_catalog | test_tablet_statistics_db | test_tablet_statistics_tab | p20250603 | 132584 | 2 |
| 2025-06-11 | default_catalog | test_tablet_statistics_db | test_tablet_statistics_tab | p20250602 | 132580 | 2 |
| 2025-06-11 | default_catalog | test_tablet_statistics_db | test_tablet_statistics_tab | p20250601 | 132572 | 2 |
| 2025-06-11 | default_catalog | test_tablet_statistics_db | test_tablet_statistics_tab | p20250602 | 132576 | 2 |
| 2025-06-11 | default_catalog | test_tablet_statistics_db | test_tablet_statistics_tab | p20250601 | 132570 | 2 |
| 2025-06-11 | default_catalog | test_tablet_statistics_db | test_tablet_statistics_tab | p20250602 | 132578 | 2 |
| 2025-06-11 | default_catalog | test_tablet_statistics_db | test_tablet_statistics_tab | p20250603 | 132588 | 2 |
| 2025-06-11 | default_catalog | test_tablet_statistics_db | test_tablet_statistics_tab | p20250601 | 132568 | 2 |
| 2025-06-11 | default_catalog | test_tablet_statistics_db | test_tablet_statistics_tab | p20250603 | 132586 | 2 |
+------------+-----------------+---------------------------+----------------------------+----------------+-----------+------------+
9 rows in set (0.01 sec)
When user set a long TTL for table partitions, but the historic partitions are never scanned or rarely scanned, it would cause unnecessary storage waste and metadata pressure. So more detailed partition scan information is needed to distinguish cold and hot partitions, it would become a reference to data governance. With the help of external tools, it is easy to parse the source table from query SQL, but it is difficult to parse which partition is scanned.
This scenario makes sense. This proposal just inspired me.
- Is only the partition statistic enough? Why not collect statistics at the tablet level?
- Is only scan statistics enough? Why not collect other statistics?
- If we need to collect new statistics, do we require another system of views or tables? Is it better to have these statistics in the same system views/tables?
So, could you consider these questions, and I believe we can make it a more powerful tool.
This scenario makes sense. This proposal just inspired me.
- Is only the partition statistic enough? Why not collect statistics at the tablet level?
- Is only scan statistics enough? Why not collect other statistics?
- If we need to collect new statistics, do we require another system of views or tables? Is it better to have these statistics in the same system views/tables?
So, could you consider these questions, and I believe we can make it a more powerful tool.
This proposal is raised from our data governance for clusters. From the perspective of a cluster administrator, data usage information is essential. For your questions: 1.For the purpose of data governance, I think collecting statistics at the partition level is enough. eg. if a historical partition is never scanned or rarely scanned, we could consider dropping it. However, collecting statistics at the tablet level raises the question of what specific actions we would take based on that information. 2.I haven't yet considered other statistics beyond scan statistics. If we find that we need more detailed statistics in the future, I would certainly add them as required. 3.Maybe it depends on what is new statistics, then decide to put it together or not. Maybe for this specific demand of partition scan statistics, you could help to push the linked PR #59621. I believe this is just the beginning, and more community members will come to enrich this scenario in the future.
@MatthewH00
When a new feature is added to a database system, it will be very hard, if not impossible, to change or deprecate it. Because users have depended on these features, any behavior change will cause incompatibility. That's why we need to discuss it before implementing it.
Suppose the table name is partition_scan, but we later want to collect the write count for a specific partition. It doesn't make sense for users to get write information from a table whose name ends with a _scan. That's why I asked those questions.
My point is that before we make a decision, we need to collect thoughts from others and discuss them.
I completely agree with you that we can iterate on new features and start from a small step. However, we must be cautious when introducing a new interface. This is why I would like to know about the new feature before reviewing the implementation.
Some topics that I want to discuss
- Where is the table created?
_statistics_orinformation_schemaor others? and why? - Can we collect tablet-level statistics? With this information, you can also achieve your goal. Please keep in mind that there may be other users experiencing different issues.
- We should consider assigning a different name to this table to leave room for future improvements.
When a new feature is added to a database system, it will be very hard, if not impossible, to change or deprecate it. Because users have depended on these features, any behavior change will cause incompatibility. That's why we need to discuss it before implementing it. Suppose the table name is
partition_scan, but we later want to collect the write count for a specific partition. It doesn't make sense for users to getwriteinformation from a table whose name ends with a_scan. That's why I asked those questions. My point is that before we make a decision, we need to collect thoughts from others and discuss them. I completely agree with you that we can iterate on new features and start from a small step. However, we must be cautious when introducing a new interface. This is why I would like to know about the new feature before reviewing the implementation.Some topics that I want to discuss
- Where is the table created?
_statistics_orinformation_schemaor others? and why?- Can we collect tablet-level statistics? With this information, you can also achieve your goal. Please keep in mind that there may be other users experiencing different issues.
- We should consider assigning a different name to this table to leave room for future improvements.
@alvin-celerdata
Hi for your three question:
1.currently it is created on _statistics_, because i notice the all system views are in information_schema, and all system physical tables are in _statistics_. the reason of choose physical table stored in _statistics_ not a view is data governance usually need trace back to the last 30/90/180/365 days, too many partition scan information would cause the fe resource waste.
2.i check the code, could collect tablet-level statistics. you think is neccessary to the tablet-level not just partition-level ?
3.just as you say above, we could replace a suitable name.
2.i check the code, could collect tablet-level statistics. you think is neccessary to the tablet-level not just partition-level ?
If we are going to do it at the tablet level, could you estimate the amount of resources required to achieve this feature? Will it impact the performance of the cluster? And the detailed design to implement this feature.
If we are going to do it at the tablet level, could you estimate the amount of resources required to achieve this feature? Will it impact the performance of the cluster? And the detailed design to implement this feature.
@alvin-celerdata
1.looking at the code, it is not difficult to collect tablet-level statistics. i would submit a new pr and the detailed design docment later today. then you could help to review. and the new system table name would named as tablet_statistics in _statistics_ database, is it ok?
2.one thing maybe impact the performance of the cluster I can think of is produce too many rows statistics data at the tablet-level. currently loading statistics data by stream load, control a suitable load frequecy would reduce impact the performance of the cluster when collect statistics data.
Could you describe your detailed design of implementation before submitting the PR?
Could you describe your detailed design of implementation before submitting the PR?
OK, i would submit detailed design later today.
Could you describe your detailed design of implementation before submitting the PR?
@alvin-celerdata Hi, the detailed design of implementation has submit, please review it when have free time.
@alvin-celerdata Hi the PR collect statistics at the tablet-level has submit #59819. please review it when have free time.
@MatthewH00 it is not easy to comment in github, could you mind to create a google doc which will be easy to comment.
@MatthewH00 it is not easy to comment in github, could you mind to create a google doc which will be easy to comment.
@alvin-celerdata feishu doc link : https://g1nj3l24fs.feishu.cn/docx/XMo7dL8GFoukMGxil54ccWc4n8d
@alvin-celerdata feishu doc link : https://g1nj3l24fs.feishu.cn/docx/XMo7dL8GFoukMGxil54ccWc4n8d
@MatthewH00 Google Docs is preferred, which is good for contributors from all around the world.
@alvin-celerdata feishu doc link : https://g1nj3l24fs.feishu.cn/docx/XMo7dL8GFoukMGxil54ccWc4n8d
@MatthewH00 Google Docs is preferred, which is good for contributors from all around the world.
@alvin-celerdata Sorry, i am work in beijing china, i can not register the google account because of phone validate failed, so i can not create the google docs. Could we communicate in current feishu doc ? Later i would organize the content of our communication and put it on GitHub.
@alvin-celerdata Please help further review the design doc or the PR #59819, the related UT and the SQL-Tester has passed.
@MatthewH00 It is very hard for global contributors to access this document
@alvin-celerdata Hi, Please help further review the design doc, this is the Google design doc StarRocks tablet statistics system table design Doc
@MatthewH00, could you make the document accessible for commenting by everyone who has the link? I request the access privilege.
@MatthewH00, could you make the document accessible for commenting by everyone who has the link? I request the access privilege.
@alvin-celerdata Please retry.
@MatthewH00 @alvin-celerdata Excuse me, the feature of access statistic at the partition level is implemented done? In our prod env, we also need the partition level system table to track the data access information when doing cluster data governance.Expect this feature.
@MatthewH00 @alvin-celerdata Excuse me, the feature of access statistic at the partition level is implemented done? In our prod env, we also need the partition level system table to track the data access information when doing cluster data governance.Expect this feature.
@alvin-celerdata It looks like other community partner need same feature in their prod env. Wonder if the previous design document has been evaluated for feasibility? StarRocks tablet statistics system table design Doc