magento2 icon indicating copy to clipboard operation
magento2 copied to clipboard

Potential Indexer Performance issue for large catalogs - Bulk Insert & Locking Entire CL table to identity the latest version

Open senthilengg opened this issue 1 month ago • 1 comments

Summary

Bulk Insert without batches here https://github.com/magento/magento2/blob/265cbda8dd4710a8f247acbb6f2052b57af24a4b/lib/internal/Magento/Framework/Mview/View/ChangelogBatchWalker.php#L99C13-L107C1 may lead to DB downtime

Retrieving the last id with Order By version_id DESC may cause performance issues when the *_cl table piled up with lot of rows and quietly possible for stores with large catalogs

https://github.com/magento/magento2/blob/410b59d8e952e9c94447cc2b31e63aebce44e9ab/lib/internal/Magento/Framework/Mview/View/Changelog.php#L315

The performance degrade can be moderate to severe based on the number of rows present in cl table.

Examples

Insert few hundred thousand to million rows in CL table and run the indexer

2 slow components should be tested

  1. https://github.com/magento/magento2/blob/265cbda8dd4710a8f247acbb6f2052b57af24a4b/lib/internal/Magento/Framework/Mview/View/ChangelogBatchWalker.php#L99C13-L107C1 << This insert query will slow down mysql. In adobe commerce with galera cluster flow control may get choked and may lead to downtime.
  2. Order BY DESC with Limit as per Mysql will be slow

Proposed solution

Since its always run in batches the version id can be manipulated easily

https://github.com/magento/magento2/blob/410b59d8e952e9c94447cc2b31e63aebce44e9ab/lib/internal/Magento/Framework/Mview/View.php#L315

$currentVersionId = $lastVersionId + $batchSize;

Above method can help avoid locking the entire table and insertFromSelect will also insert in small batches which helps the indexer update in a responsive fashion than waiting for the entire rows to complete.

If 100,000 rows in CL then currently indexer status will not get updated until 100K complete in 20 batches assuming per batch 5000 but with the above proposed change it will get update for every batch.

Benefits

  1. More responsive indexer status

Admin panel shows actual progress instead of freezing for minutes & sometime for hours.

  1. Less DB locking and faster concurrency

Insert in batches reduce long-running locks.

  1. Improved fault tolerance

If the process stops after N batches, next run resumes from version_id N.

  1. Better observability and diagnosability

Logs and monitoring tools can see actual batch progression.

Release note

No response

Triage and priority

  • [ ] Severity: S0 - Affects critical data or functionality and leaves users without workaround.
  • [x] Severity: S1 - Affects critical data or functionality and forces users to employ a workaround.
  • [ ] Severity: S2 - Affects non-critical data or functionality and forces users to employ a workaround.
  • [ ] Severity: S3 - Affects non-critical data or functionality and does not force users to employ a workaround.
  • [ ] Severity: S4 - Affects aesthetics, professional look and feel, “quality” or “usability”.

senthilengg avatar Nov 30 '25 05:11 senthilengg

Hi @senthilengg. Thank you for your report. To speed up processing of this issue, make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce.


Join Magento Community Engineering Slack and ask your questions in #github channel. :warning: According to the Magento Contribution requirements, all issues must go through the Community Contributions Triage process. Community Contributions Triage is a public meeting. :clock10: You can find the schedule on the Magento Community Calendar page. :telephone_receiver: The triage of issues happens in the queue order. If you want to speed up the delivery of your contribution, join the Community Contributions Triage session to discuss the appropriate ticket.

m2-assistant[bot] avatar Nov 30 '25 05:11 m2-assistant[bot]

Hi @engcom-Hotel. Thank you for working on this issue. In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down:

  • [ ] 1. Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).
  • [ ] 2. Verify that issue has a meaningful description and provides enough information to reproduce the issue.
  • [ ] 3. Add Area: XXXXX label to the ticket, indicating the functional areas it may be related to.
  • [ ] 4. Verify that the issue is reproducible on 2.4-develop branch
    Details- If the issue is reproducible on 2.4-develop branch, please, add the label Reproduced on 2.4.x.
    - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and stop verification process here!
  • [ ] 5. Add label Issue: Confirmed once verification is complete.
  • [ ] 6. Make sure that automatic system confirms that report has been added to the backlog.

m2-assistant[bot] avatar Dec 15 '25 06:12 m2-assistant[bot]

Hello @senthilengg,

Thank you for your report and collaboration!

Can you please let us know on which amount of data you are facing this issue?

Thank you

engcom-Hotel avatar Dec 15 '25 06:12 engcom-Hotel

@engcom-Hotel 500k+ is what you can try (ideally with galera cluster flow control). Alternatively with 500k rows you can try to run indexer and log the count of rows from the select query of insertFromSelect https://github.com/magento/magento2/blob/265cbda8dd4710a8f247acbb6f2052b57af24a4b/lib/internal/Magento/Framework/Mview/View/ChangelogBatchWalker.php#L99C13-L107C1 to confirm it’s a single bulk insert which can potentially bring down the db.

senthilengg avatar Dec 15 '25 07:12 senthilengg