tidb icon indicating copy to clipboard operation
tidb copied to clipboard

planner: avoid exceeding the configured concurrency limit (#61786)

Open ti-chi-bot opened this issue 6 months ago • 4 comments

This is an automated cherry-pick of #61786

What problem does this PR solve?

Issue Number: close #61785

Problem Summary:

The issue is that customers have observed higher I/O consumption when the analyze operation reaches the index, compared to when it analyzes regular tables. (The analyze status contains sensitive information, so it will not be included here.)

Image

The root cause of the issue lies in improper coding practices. When we perform the analyze operation, we create multiple concurrent tasks to execute it. However, within these concurrently spawned goroutines, we further create additional concurrency. This nested concurrency results in an actual level of parallelism that is significantly higher than we anticipated.

CREATE TABLE `test` (
  `c1` binary(16) NOT NULL,
  `c2` tinyint(1) NOT NULL DEFAULT '0',
  `c3` int NOT NULL,
  `c4` varchar(48) COLLATE utf8mb4_general_ci NOT NULL,
  `c5` varchar(512) COLLATE utf8mb4_general_ci DEFAULT NULL,
  `c6` enum('A','B','C') COLLATE utf8mb4_general_ci DEFAULT NULL,
  `c7` int unsigned NOT NULL DEFAULT '0',
  `c8` int unsigned NOT NULL DEFAULT '0',
  `c9` tinyint(1) GENERATED ALWAYS AS (`c7` > 0) VIRTUAL NOT NULL,
  `c10` int DEFAULT NULL,
  `c11` datetime(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
  `c12` datetime(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
  PRIMARY KEY (`c1`) /*T![clustered_index] CLUSTERED */,
  KEY `idx_c4_c2_c9_c3_c12_c5_c6` (`c4`,`c2`,`c9`,`c3`,`c12`,`c5`,`c6`),
  KEY `idx_c4_c2_c9_c12_c5_c6` (`c4`,`c2`,`c9`,`c12`,`c5`,`c6`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

analyze table chat_session all columns ;

show analyze status

+--------------+------------+----------------+-----------------------------------------------------------------------------------------------------------------+----------------+---------------------+---------------------+----------+-------------+----------------+------------+-------------------+----------+----------------------+
| Table_schema | Table_name | Partition_name | Job_info                                                                                                        | Processed_rows | Start_time          | End_time            | State    | Fail_reason | Instance       | Process_ID | Remaining_seconds | Progress | Estimated_total_rows |
+--------------+------------+----------------+-----------------------------------------------------------------------------------------------------------------+----------------+---------------------+---------------------+----------+-------------+----------------+------------+-------------------+----------+----------------------+
| test         | test       |                | analyze ndv for index idx_c4_c2_c9_c12_c5_c6                                                                    | 0              | 2025-06-18 14:48:05 | 2025-06-18 14:48:05 | finished | <null>      | 127.0.0.1:4000 | <null>     | <null>            | <null>   | <null>               |
| test         | test       |                | analyze ndv for index idx_c4_c2_c9_c3_c12_c5_c6                                                                 | 0              | 2025-06-18 14:48:05 | 2025-06-18 14:48:05 | finished | <null>      | 127.0.0.1:4000 | <null>     | <null>            | <null>   | <null>               |
| test         | test       |                | analyze table all indexes, columns c1, c2, c3, c4, c5, c6, c7, c9, c12 with 256 buckets, 100 topn, 1 samplerate | 0              | 2025-06-18 14:48:05 | 2025-06-18 14:48:05 | finished | <null>      | 127.0.0.1:4000 | <null>     | <null>            | <null>   | <null>               |
+--------------+------------+----------------+-----------------------------------------------------------------------------------------------------------------+----------------+---------------------+---------------------+----------+-------------+----------------+------------+-------------------+----------+----------------------+

You will see that it will create two task about analyze ndv for index.

the problem is here.

The first creation of concurrency

https://github.com/pingcap/tidb/blob/8fc1430b8340589d2967697a457c730caef1f9ba/pkg/executor/analyze.go#L121-L126

The second creation of concurrency

AnalyzeExec.analyzeWorker -> analyzeColumnsPushDownEntry -> analyzeColumnsPushDownV2

https://github.com/pingcap/tidb/blob/master/pkg/executor/analyze_col_v2.go#L105-L107

The third creation of concurrency

https://github.com/pingcap/tidb/blob/8fc1430b8340589d2967697a457c730caef1f9ba/pkg/executor/analyze_col_v2.go#L461-L466

This part is actually the most dangerous. It allows the concurrency of handleNDVForSpecialIndexes and the concurrency of column collection to coexist, which increases the business risk.

What changed and how does it work?

1、Wait untilhandleNDVForSpecialIndexesis completed before proceeding with the statistics collection for columns.

2、To prevent modifying the build stats concurrency, which could result in an exponential relationship in the actual number of concurrent tasks, we set the concurrency here to be the same as the build sampling concurrency.

Check List

Tests

  • [x] Unit test
  • [ ] Integration test
  • [ ] Manual test (add detailed scripts or steps below)
  • [ ] No need to test
    • [ ] I checked and no code files have been changed.

Side effects

  • [ ] Performance regression: Consumes more CPU
  • [ ] Performance regression: Consumes more Memory
  • [ ] Breaking backward compatibility

Documentation

  • [ ] Affects user behaviors
  • [ ] Contains syntax changes
  • [ ] Contains variable changes
  • [ ] Contains experimental features
  • [ ] Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

ti-chi-bot avatar Jun 18 '25 15:06 ti-chi-bot

@hawkingrei This PR has conflicts, I have hold it. Please resolve them or ask others to resolve them, then comment /unhold to remove the hold label.

ti-chi-bot avatar Jun 18 '25 15:06 ti-chi-bot

/unhold

hawkingrei avatar Jul 08 '25 09:07 hawkingrei

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AilinKid, hawkingrei

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • ~~OWNERS~~ [AilinKid,hawkingrei]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot[bot] avatar Jul 08 '25 09:07 ti-chi-bot[bot]

[LGTM Timeline notifier]

Timeline:

  • 2025-07-04 23:33:38.094147216 +0000 UTC m=+1697070.817326196: :ballot_box_with_check: agreed by hawkingrei.
  • 2025-07-08 09:34:30.982899874 +0000 UTC m=+1992323.706078856: :ballot_box_with_check: agreed by AilinKid.

ti-chi-bot[bot] avatar Jul 08 '25 09:07 ti-chi-bot[bot]

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Please upload report for BASE (release-7.5@469af9d). Learn more about missing BASE report.

Additional details and impacted files
@@               Coverage Diff                @@
##             release-7.5     #61813   +/-   ##
================================================
  Coverage               ?   72.2023%           
================================================
  Files                  ?       1417           
  Lines                  ?     414294           
  Branches               ?          0           
================================================
  Hits                   ?     299130           
  Misses                 ?      95172           
  Partials               ?      19992           
Flag Coverage Δ
unit 72.2023% <100.0000%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.9400% <0.0000%> (?)
parser ∅ <0.0000%> (?)
br 53.5323% <0.0000%> (?)
:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Jul 08 '25 10:07 codecov[bot]

/retest

fixdb avatar Jul 08 '25 23:07 fixdb