starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

[Feature] add tablet statistics system table to track the data accessing information

Open MatthewH00 opened this issue 6 months ago • 18 comments

Feature request

Is your feature request related to a problem? Please describe. Currently the tablet lacks operational behavior(like read/write) statistics, based on statistics, we could acquire more information about the data accessing information. A typical scenario such as: when user set a long TTL for table partitions, but the historic partitions are never scanned or rarely scanned, it would cause unnecessary storage waste and metadata pressure. So more detailed partition scan information is needed to distinguish cold and hot partitions, it would become a reference to data governance.

Describe the solution you'd like Add a system table _statistics_.tablet_statistics to collect the tablet statistics. note: currently just collect tablet read statistics, in the future could collect more statistics like write.

Technical Solution 1.create system table When the cluster is initially build or upgrade from old version, the system table _statistics_.tablet_statistics would be created in _statistics_ system database. create table sql:

CREATE TABLE _statistics_.tablet_statistics (
  `dt` date NOT NULL COMMENT "",
  `catalog_name` varchar(65533) NOT NULL COMMENT "",
  `db_name` varchar(65533) NOT NULL COMMENT "",
  `table_name` varchar(65533) NOT NULL COMMENT "",
  `partition_name` varchar(65533) NOT NULL COMMENT "",
  `tablet_id` bigint(20) NULL COMMENT "",
  `read_count` double SUM NOT NULL COMMENT ""
) ENGINE=OLAP 
AGGREGATE KEY(`dt`, `catalog_name`, `db_name`, `table_name`, `partition_name`, `tablet_id`)
PARTITION BY date_trunc('day', `dt`)
DISTRIBUTED BY HASH(`tablet_id`) BUCKETS 10 
PROPERTIES (
"partition_live_number" = "3"
);

the system table design point as follows:

  • physical table, not a view, the data is saved on be, load data by stream load.

note: the reason of choose physical table not a view is in typical scenario of data governance usually need trace back to the last 30/90/180/365 days, too many tablet statistics would cause the fe resource waste.

  • aggregate model table, the aggregate key is dt,catalog_name,db_name,table_name,partition_name,tablet_id, the metric key is read_count and aggregate function is SUM

  • partitioned table, partition key is dt, the ttl is set by parameter tablet_statistics_keep_days(default value is 3 days)

  • when the table in the query sql is partitioned, the partition_name in _statistics_.tablet_statistics display partition name. otherwise display table name.

2.feature enable the feature is enabled by parameter enable_tablet_statistics(default value is false)

3.feature workflow

Image

  • handle query in handleQueryStmt and transfer execPlan to TabletStatisticsMgr

  • parse scanTabletId from execPlan

  • accumulate data for scanTabletId in batch, load frequency is set by parameter tablet_statistics_load_interval_seconds(default value is 5 min) and batch size(default value is 500)

  • when load is required, get catalog/db/table/partition info from scanTabletId, and build tablet statistics object

  • load tablet statistics into system table by streamLoad

4.system table query result

MySQL [_statistics_]> select * from _statistics_.tablet_statistics;
+------------+-----------------+---------------------------+----------------------------+----------------+-----------+------------+
| dt         | catalog_name    | db_name                   | table_name                 | partition_name | tablet_id | read_count |
+------------+-----------------+---------------------------+----------------------------+----------------+-----------+------------+
| 2025-06-11 | default_catalog | test_tablet_statistics_db | test_tablet_statistics_tab | p20250603      |    132584 |          2 |
| 2025-06-11 | default_catalog | test_tablet_statistics_db | test_tablet_statistics_tab | p20250602      |    132580 |          2 |
| 2025-06-11 | default_catalog | test_tablet_statistics_db | test_tablet_statistics_tab | p20250601      |    132572 |          2 |
| 2025-06-11 | default_catalog | test_tablet_statistics_db | test_tablet_statistics_tab | p20250602      |    132576 |          2 |
| 2025-06-11 | default_catalog | test_tablet_statistics_db | test_tablet_statistics_tab | p20250601      |    132570 |          2 |
| 2025-06-11 | default_catalog | test_tablet_statistics_db | test_tablet_statistics_tab | p20250602      |    132578 |          2 |
| 2025-06-11 | default_catalog | test_tablet_statistics_db | test_tablet_statistics_tab | p20250603      |    132588 |          2 |
| 2025-06-11 | default_catalog | test_tablet_statistics_db | test_tablet_statistics_tab | p20250601      |    132568 |          2 |
| 2025-06-11 | default_catalog | test_tablet_statistics_db | test_tablet_statistics_tab | p20250603      |    132586 |          2 |
+------------+-----------------+---------------------------+----------------------------+----------------+-----------+------------+
9 rows in set (0.01 sec)

MatthewH00 avatar Jun 06 '25 03:06 MatthewH00

When user set a long TTL for table partitions, but the historic partitions are never scanned or rarely scanned, it would cause unnecessary storage waste and metadata pressure. So more detailed partition scan information is needed to distinguish cold and hot partitions, it would become a reference to data governance. With the help of external tools, it is easy to parse the source table from query SQL, but it is difficult to parse which partition is scanned.

This scenario makes sense. This proposal just inspired me.

  1. Is only the partition statistic enough? Why not collect statistics at the tablet level?
  2. Is only scan statistics enough? Why not collect other statistics?
  3. If we need to collect new statistics, do we require another system of views or tables? Is it better to have these statistics in the same system views/tables?

So, could you consider these questions, and I believe we can make it a more powerful tool.

alvin-celerdata avatar Jun 06 '25 20:06 alvin-celerdata

This scenario makes sense. This proposal just inspired me.

  1. Is only the partition statistic enough? Why not collect statistics at the tablet level?
  2. Is only scan statistics enough? Why not collect other statistics?
  3. If we need to collect new statistics, do we require another system of views or tables? Is it better to have these statistics in the same system views/tables?

So, could you consider these questions, and I believe we can make it a more powerful tool.

This proposal is raised from our data governance for clusters. From the perspective of a cluster administrator, data usage information is essential. For your questions: 1.For the purpose of data governance, I think collecting statistics at the partition level is enough. eg. if a historical partition is never scanned or rarely scanned, we could consider dropping it. However, collecting statistics at the tablet level raises the question of what specific actions we would take based on that information. 2.I haven't yet considered other statistics beyond scan statistics. If we find that we need more detailed statistics in the future, I would certainly add them as required. 3.Maybe it depends on what is new statistics, then decide to put it together or not. Maybe for this specific demand of partition scan statistics, you could help to push the linked PR #59621. I believe this is just the beginning, and more community members will come to enrich this scenario in the future.

MatthewH00 avatar Jun 07 '25 00:06 MatthewH00

@MatthewH00

When a new feature is added to a database system, it will be very hard, if not impossible, to change or deprecate it. Because users have depended on these features, any behavior change will cause incompatibility. That's why we need to discuss it before implementing it. Suppose the table name is partition_scan, but we later want to collect the write count for a specific partition. It doesn't make sense for users to get write information from a table whose name ends with a _scan. That's why I asked those questions. My point is that before we make a decision, we need to collect thoughts from others and discuss them. I completely agree with you that we can iterate on new features and start from a small step. However, we must be cautious when introducing a new interface. This is why I would like to know about the new feature before reviewing the implementation.

Some topics that I want to discuss

  1. Where is the table created? _statistics_ or information_schema or others? and why?
  2. Can we collect tablet-level statistics? With this information, you can also achieve your goal. Please keep in mind that there may be other users experiencing different issues.
  3. We should consider assigning a different name to this table to leave room for future improvements.

alvin-celerdata avatar Jun 07 '25 01:06 alvin-celerdata

@MatthewH00

When a new feature is added to a database system, it will be very hard, if not impossible, to change or deprecate it. Because users have depended on these features, any behavior change will cause incompatibility. That's why we need to discuss it before implementing it. Suppose the table name is partition_scan, but we later want to collect the write count for a specific partition. It doesn't make sense for users to get write information from a table whose name ends with a _scan. That's why I asked those questions. My point is that before we make a decision, we need to collect thoughts from others and discuss them. I completely agree with you that we can iterate on new features and start from a small step. However, we must be cautious when introducing a new interface. This is why I would like to know about the new feature before reviewing the implementation.

Some topics that I want to discuss

  1. Where is the table created? _statistics_ or information_schema or others? and why?
  2. Can we collect tablet-level statistics? With this information, you can also achieve your goal. Please keep in mind that there may be other users experiencing different issues.
  3. We should consider assigning a different name to this table to leave room for future improvements.

@alvin-celerdata Hi for your three question: 1.currently it is created on _statistics_, because i notice the all system views are in information_schema, and all system physical tables are in _statistics_. the reason of choose physical table stored in _statistics_ not a view is data governance usually need trace back to the last 30/90/180/365 days, too many partition scan information would cause the fe resource waste. 2.i check the code, could collect tablet-level statistics. you think is neccessary to the tablet-level not just partition-level ? 3.just as you say above, we could replace a suitable name.

MatthewH00 avatar Jun 09 '25 05:06 MatthewH00

2.i check the code, could collect tablet-level statistics. you think is neccessary to the tablet-level not just partition-level ?

If we are going to do it at the tablet level, could you estimate the amount of resources required to achieve this feature? Will it impact the performance of the cluster? And the detailed design to implement this feature.

alvin-celerdata avatar Jun 09 '25 16:06 alvin-celerdata

If we are going to do it at the tablet level, could you estimate the amount of resources required to achieve this feature? Will it impact the performance of the cluster? And the detailed design to implement this feature.

@alvin-celerdata 1.looking at the code, it is not difficult to collect tablet-level statistics. i would submit a new pr and the detailed design docment later today. then you could help to review. and the new system table name would named as tablet_statistics in _statistics_ database, is it ok? 2.one thing maybe impact the performance of the cluster I can think of is produce too many rows statistics data at the tablet-level. currently loading statistics data by stream load, control a suitable load frequecy would reduce impact the performance of the cluster when collect statistics data.

MatthewH00 avatar Jun 10 '25 01:06 MatthewH00

Could you describe your detailed design of implementation before submitting the PR?

alvin-celerdata avatar Jun 10 '25 02:06 alvin-celerdata

Could you describe your detailed design of implementation before submitting the PR?

OK, i would submit detailed design later today.

MatthewH00 avatar Jun 10 '25 02:06 MatthewH00

Could you describe your detailed design of implementation before submitting the PR?

@alvin-celerdata Hi, the detailed design of implementation has submit, please review it when have free time.

MatthewH00 avatar Jun 10 '25 10:06 MatthewH00

@alvin-celerdata Hi the PR collect statistics at the tablet-level has submit #59819. please review it when have free time.

MatthewH00 avatar Jun 11 '25 12:06 MatthewH00

@MatthewH00 it is not easy to comment in github, could you mind to create a google doc which will be easy to comment.

alvin-celerdata avatar Jun 12 '25 01:06 alvin-celerdata

@MatthewH00 it is not easy to comment in github, could you mind to create a google doc which will be easy to comment.

@alvin-celerdata feishu doc link : https://g1nj3l24fs.feishu.cn/docx/XMo7dL8GFoukMGxil54ccWc4n8d

MatthewH00 avatar Jun 12 '25 03:06 MatthewH00

@alvin-celerdata feishu doc link : https://g1nj3l24fs.feishu.cn/docx/XMo7dL8GFoukMGxil54ccWc4n8d

@MatthewH00 Google Docs is preferred, which is good for contributors from all around the world.

alvin-celerdata avatar Jun 12 '25 17:06 alvin-celerdata

@alvin-celerdata feishu doc link : https://g1nj3l24fs.feishu.cn/docx/XMo7dL8GFoukMGxil54ccWc4n8d

@MatthewH00 Google Docs is preferred, which is good for contributors from all around the world.

@alvin-celerdata Sorry, i am work in beijing china, i can not register the google account because of phone validate failed, so i can not create the google docs. Could we communicate in current feishu doc ? Later i would organize the content of our communication and put it on GitHub.

@alvin-celerdata Please help further review the design doc or the PR #59819, the related UT and the SQL-Tester has passed.

MatthewH00 avatar Jun 16 '25 06:06 MatthewH00

@MatthewH00 It is very hard for global contributors to access this document

Image

alvin-celerdata avatar Jun 16 '25 07:06 alvin-celerdata

@alvin-celerdata Hi, Please help further review the design doc, this is the Google design doc StarRocks tablet statistics system table design Doc

MatthewH00 avatar Jun 17 '25 09:06 MatthewH00

@MatthewH00, could you make the document accessible for commenting by everyone who has the link? I request the access privilege.

alvin-celerdata avatar Jun 17 '25 10:06 alvin-celerdata

@MatthewH00, could you make the document accessible for commenting by everyone who has the link? I request the access privilege.

@alvin-celerdata Please retry.

MatthewH00 avatar Jun 17 '25 10:06 MatthewH00

@MatthewH00 @alvin-celerdata Excuse me, the feature of access statistic at the partition level is implemented done? In our prod env, we also need the partition level system table to track the data access information when doing cluster data governance.Expect this feature.

Marvin5181 avatar Aug 07 '25 03:08 Marvin5181

@MatthewH00 @alvin-celerdata Excuse me, the feature of access statistic at the partition level is implemented done? In our prod env, we also need the partition level system table to track the data access information when doing cluster data governance.Expect this feature.

@alvin-celerdata It looks like other community partner need same feature in their prod env. Wonder if the previous design document has been evaluated for feasibility? StarRocks tablet statistics system table design Doc

MatthewH00 avatar Aug 07 '25 06:08 MatthewH00