magento2 icon indicating copy to clipboard operation
magento2 copied to clipboard

Price index performance on very large stores

Open nicka101 opened this issue 2 years ago • 20 comments

Preconditions and environment

  • Magento version 2.4.5+ (or any release after 263c17f)
  • Have a lot of products (the more, with more prices, the easier this issue is to reproduce for demo)

Steps to reproduce

  1. Reindex catalog_product_index_price
  2. Observe that since 263c17f the performance of deleteOutdatedData is much worse (around 800-1200x slower than it should be in our very large instances with ~30 million prices)

Expected result

Price reindex is workable, and uses indexes for its join in deleteOutdatedData, because they exist

Actual result

It's 800-1200x slower than it should be, making a reindex take multiple days (based on the runtime of the individual queries, but can't tell for sure, as i'm not waiting that long), when on previous versions it took 4 hours

Additional information

When catalog_product_index_price_temp is created, it's based on catalog_product_index_price_tmp , which since 263c17f has had different primary key and indexes to the main table, meaning this JOIN has no indexes to use for the select, and is forced to do a full tablescan

We reproduced it as follows:

  1. All indexes have been switched to realtime mode - bin/magento indexer:set-mode realtime

  2. Then we started indexing only for MSI - bin/magento indexer:reindex inventory

As @vladimirspucko described above, the inventory indexing itself was performed quickly, about a minute for 850k products in our case. But then a plugin came into operation, which starts price indexing in realtime mode: \Magento\InventoryCatalog\Plugin\InventoryIndexer\Indexer\Stock\Strategy\Sync\PriceIndexUpdatePlugin::afterExecuteList()

which calls \Magento\Catalog\Model\Indexer\Product\Price\Processor::reindexList()

And if the mode is not schedule this indexing has never been performed on our installation with 850k products.

Note: if we use schedule mode, then full inventory indexing takes about 50 seconds for all the same 850k products

Release note

No response

Triage and priority

  • [X] Severity: S0 - Affects critical data or functionality and leaves users without workaround.
  • [ ] Severity: S1 - Affects critical data or functionality and forces users to employ a workaround.
  • [ ] Severity: S2 - Affects non-critical data or functionality and forces users to employ a workaround.
  • [ ] Severity: S3 - Affects non-critical data or functionality and does not force users to employ a workaround.
  • [ ] Severity: S4 - Affects aesthetics, professional look and feel, “quality” or “usability”.

nicka101 avatar Jul 03 '23 12:07 nicka101

Hi @nicka101. Thank you for your report. To speed up processing of this issue, make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce. To deploy vanilla Magento instance on our environment, Add a comment to the issue:


Join Magento Community Engineering Slack and ask your questions in #github channel. :warning: According to the Magento Contribution requirements, all issues must go through the Community Contributions Triage process. Community Contributions Triage is a public meeting. :clock10: You can find the schedule on the Magento Community Calendar page. :telephone_receiver: The triage of issues happens in the queue order. If you want to speed up the delivery of your contribution, join the Community Contributions Triage session to discuss the appropriate ticket.

m2-assistant[bot] avatar Jul 03 '23 12:07 m2-assistant[bot]

Hi @engcom-Dash. Thank you for working on this issue. In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down:

    1. Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).
    1. Verify that issue has a meaningful description and provides enough information to reproduce the issue.
    1. Add Area: XXXXX label to the ticket, indicating the functional areas it may be related to.
    1. Verify that the issue is reproducible on 2.4-develop branch
      Details- Add the comment @magento give me 2.4-develop instance to deploy test instance on Magento infrastructure.
      - If the issue is reproducible on 2.4-develop branch, please, add the label Reproduced on 2.4.x.
      - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and stop verification process here!

m2-assistant[bot] avatar Aug 10 '23 05:08 m2-assistant[bot]

Hi @engcom-Hotel. Thank you for working on this issue. In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down:

  • [ ] 1. Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).
  • [ ] 2. Verify that issue has a meaningful description and provides enough information to reproduce the issue.
  • [ ] 3. Add Area: XXXXX label to the ticket, indicating the functional areas it may be related to.
  • [ ] 4. Verify that the issue is reproducible on 2.4-develop branch
    Details- Add the comment @magento give me 2.4-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.4-develop branch, please, add the label Reproduced on 2.4.x.
    - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and stop verification process here!
  • [ ] 5. Add label Issue: Confirmed once verification is complete.
  • [ ] 6. Make sure that automatic system confirms that report has been added to the backlog.

m2-assistant[bot] avatar Aug 16 '23 09:08 m2-assistant[bot]

Hello @nicka101,

Thanks for the report and contribution!

We have tried to reproduce the issue in 2.4-develop as well as 2.4.4 just prior to 2.4.5 (Before that commit), but it seems the issue is not reproducible for us.

We have tried to reproduce the issue with almost 39360 products, by changing the price of the products but for us the reindexing took a similar time.

Have you checked the same with some profiling tools?

Thanks

engcom-Hotel avatar Aug 16 '23 10:08 engcom-Hotel

Faced with the same issue on a catalog of 850k products, the proposed solution really speeds up the indexing process.

DmitryFurs avatar Sep 04 '23 14:09 DmitryFurs

Hello @DmitryFurs,

Thanks for the response!

Can you please let us know the steps to reproduce the issue? Actually, in the first attempt, we were unable to reproduce the issue. You can got through our comment here https://github.com/magento/magento2/issues/37700#issuecomment-1680321607.

Thanks

engcom-Hotel avatar Sep 05 '23 05:09 engcom-Hotel

We faced the same issue using Magento 2.4.6 + MSI. We have ~2600 products (simple, configurable, bundle) and 9 sources/stocks. In our particular case, the price index itself takes less than 1 minute but the inventory index takes more than 1 hour. With the proposed solution, the inventory index takes less than 2 minutes now.

vladimirspucko avatar Sep 05 '23 06:09 vladimirspucko

Hi @engcom-Hotel,

We reproduced it as follows:

  1. All indexes have been switched to realtime mode - bin/magento indexer:set-mode realtime

  2. Then we started indexing only for MSI - bin/magento indexer:reindex inventory

As @vladimirspucko described above, the inventory indexing itself was performed quickly, about a minute for 850k products in our case. But then a plugin came into operation, which starts price indexing in realtime mode: \Magento\InventoryCatalog\Plugin\InventoryIndexer\Indexer\Stock\Strategy\Sync\PriceIndexUpdatePlugin::afterExecuteList()

which calls \Magento\Catalog\Model\Indexer\Product\Price\Processor::reindexList()

And if the mode is not schedule this indexing has never been performed on our installation with 850k products.

Note: if we use schedule mode, then full inventory indexing takes about 50 seconds for all the same 850k products

DmitryFurs avatar Sep 05 '23 12:09 DmitryFurs

Hello @DmitryFurs,

Thanks for the quick response!

The issue is reproducible for us in the 2.4-develop branch. We have tried to reproduce the issue with almost 40K products.

Hence confirming the issue.

Thanks

engcom-Hotel avatar Sep 06 '23 14:09 engcom-Hotel

:white_check_mark: Jira issue https://jira.corp.adobe.com/browse/AC-9470 is successfully created for this GitHub issue.

github-jira-sync-bot avatar Sep 06 '23 14:09 github-jira-sync-bot

:white_check_mark: Confirmed by @engcom-Hotel. Thank you for verifying the issue.
Issue Available: @engcom-Hotel, You will be automatically unassigned. Contributors/Maintainers can claim this issue to continue. To reclaim and continue work, reassign the ticket to yourself.

m2-assistant[bot] avatar Sep 06 '23 14:09 m2-assistant[bot]

This is really a big issue. My partial price index takes several hours. I solved it by just adding an index to the "catalog_product_index_price_tmp" for "entity_id", "customer_group_id", "website_id"

mvenghaus avatar Sep 18 '23 16:09 mvenghaus

Any updates on this issue? In our case, reindexing of 100k products takes hours... On custom product bulk import, we try to reindex only changed products using the reindexList method. It takes hours. If we do the full reindex with the reindexAll method, it takes a couple of minutes.

speller avatar Mar 03 '24 16:03 speller

In Magento 2.4.7, the issue is still there.

Interestingly, there's a quality patch ACSD-56415 Fixes the issue where the performance of the partial price indexing is slowed down due to a DELETE query when the database has a lot of partial price data to index for 2.4.6-p3 which fixes the issue but this patch is not available for 2.4.7. The ACSD-56415 patch does exactly the same as what @mvenghaus mentioned.

I'm surprised that maintainers didn't mention this patch here as a possible solution.

I've adopted the ACSD-56415 patch to 2.4.7 by converting it to a composer patch and can confirm that it fixes the issue on my side. For example, without the patch, on my dev machine, 250 products were reindexed in 10 mins so I didn't even wait for the full reindex. It's clear that it'll take hours for 50K products. The full reindex of the 2M products database took 80 mins. With the patch applied, 50K products were reindexed in 1 minute.

speller avatar Apr 25 '24 06:04 speller

Hello @speller,

The patch will be incorporated into the forthcoming release, namely 2.4.8-beta1. But we will update you if this can be incorporated sooner.

Thanks

engcom-Hotel avatar May 07 '24 13:05 engcom-Hotel

@engcom-Hotel could you also make the patch available for 2.4.7?

speller avatar May 07 '24 13:05 speller

@speller I will update you on this soon.

engcom-Hotel avatar May 07 '24 13:05 engcom-Hotel

This was actually introduced in 2.4.6: https://github.com/magento/magento2/blob/2.4.6/app/code/Magento/Catalog/Model/Indexer/Product/Price/AbstractAction.php

We've experienced the indexing to not complete even though it was left running overnight from the end of the working day to the start of the new one due to a very large amount of prices; there's multiple stores, multiple customer groups and a lot of products.

I can see that the solution proposed at https://github.com/magento/magento2/pull/37701 has been rejected because indexes slow down table writes. However, the table writes are for a temporary table that is only involved in the indexing process and for the most part the tradeoff is better overall.

I also tried changing the sql query itself to improve performance. This worked well, but I've found the best results, for our case anyway, was to combine both the query change and add indexes to the temporary table catalog_product_index_price_tmp as in the pull request https://github.com/magento/magento2/pull/37701 (also linked above).

magento/vendor/magento/module-catalog/etc/db_schema.xml

        <constraint xsi:type="unique" referenceId="CAT_PRD_IDX_PRICE_TMP_ENTT_ID_CSTR_GROUP_ID_WS_ID">
            <column name="entity_id"/>
            <column name="customer_group_id"/>
            <column name="website_id"/>
        </constraint>

Note I've not fully tested data integrity yet with this. Probably be on Monday 17th I get back to look at that. I'll post a patch if all looks good then.

EDIT: See below

andy-igoo avatar Jun 13 '24 14:06 andy-igoo

This was actually introduced in 2.4.6: https://github.com/magento/magento2/blob/2.4.6/app/code/Magento/Catalog/Model/Indexer/Product/Price/AbstractAction.php

We've experienced the indexing to not complete even though it was left running overnight from the end of the working day to the start of the new one due to a very large amount of prices; there's multiple stores, multiple customer ....

On further testing we've found that the change on the sql query only makes a slight difference so in terms in risk and reward, it's probably better just leaving the query alone.

If anyone needs the patch (based on https://github.com/magento/magento2/pull/37701) to change the table indexes then here's a patch for 2.4.7:

37700_inventoryIndexer.patch

diff --git a/vendor/magento/module-catalog/etc/db_schema_whitelist.json b/vendor/magento/module-catalog/etc/db_schema_whitelist.json
--- a/vendor/magento/module-catalog/etc/db_schema_whitelist.json	2024-06-17 11:12:06.595496943 +0100
+++ b/vendor/magento/module-catalog/etc/db_schema_whitelist.json	2024-06-14 08:27:31.648158048 +0100
@@ -987,7 +987,8 @@
             "final_price": true,
             "min_price": true,
             "max_price": true,
-            "tier_price": true
+            "tier_price": true,
+            "id": true
         },
         "index": {
             "CATALOG_PRODUCT_INDEX_PRICE_TMP_CUSTOMER_GROUP_ID": true,
@@ -995,7 +996,8 @@
             "CATALOG_PRODUCT_INDEX_PRICE_TMP_MIN_PRICE": true
         },
         "constraint": {
-            "PRIMARY": true
+            "PRIMARY": true,
+            "CAT_PRD_IDX_PRICE_TMP_ENTT_ID_CSTR_GROUP_ID_WS_ID": true
         }
     },
     "catalog_category_product_index_tmp": {

diff --git a/vendor/magento/module-catalog/etc/db_schema.xml b/vendor/magento/module-catalog/etc/db_schema.xml
--- a/vendor/magento/module-catalog/etc/db_schema.xml	2024-06-17 11:11:56.495474101 +0100
+++ b/vendor/magento/module-catalog/etc/db_schema.xml	2024-06-14 08:28:02.284226062 +0100
@@ -1645,6 +1645,11 @@
         <constraint xsi:type="primary" referenceId="PRIMARY">
             <column name="id"/>
         </constraint>
+        <constraint xsi:type="unique" referenceId="CAT_PRD_IDX_PRICE_TMP_ENTT_ID_CSTR_GROUP_ID_WS_ID">
+            <column name="entity_id"/>
+            <column name="customer_group_id"/>
+            <column name="website_id"/>
+        </constraint>
     </table>
     <table name="catalog_category_product_index_tmp" resource="default" engine="innodb"
            comment="Catalog Category Product Indexer temporary table">

Apply patch with patch -p1 --forward < 37700_inventoryIndexer.patch

::EDIT::

⚠️ WARNING: Do not use on Magento 2.4.8, it will break your schema declarations.

andy-igoo avatar Jun 17 '24 11:06 andy-igoo

would it be possible to apply the patch to magento 2.3.7 ? https://github.com/nicka101/magento2/commit/90e4ecc06fa1be08f736c8a5fbe699430e9bc3c7

vandark25 avatar Oct 20 '24 21:10 vandark25

@engcom-Hotel

The patch in the Quality Patch Tool is not available yet for 2.4.7, and the change is not implemented in 2.4.7-p4 (only in 2.4.-develop with 2.4.8-beta1 as the first tag containing the commit, see details).

Please can you check if the patch will be released officially for 2.4.7?

mzeis avatar Mar 11 '25 12:03 mzeis

Hello @mzeis,

Thanks for raising the concern!

We are in discussion for this concern with the internal team. I will update you soon on this. Meanwhile moving this issue on hold.

Thanks

engcom-Hotel avatar Mar 13 '25 11:03 engcom-Hotel

Hello @mzeis,

We have an update from the internal team, they have created this JIRA for the patch release for 2.4.7 release line.

I will update you once same has been done. Till the time keeping this issue On Hold status.

Thanks

engcom-Hotel avatar May 08 '25 12:05 engcom-Hotel

Hello @mzeis @nicka101,

We have received a reply from the internal team, the patch for this issue has been available now. Now the patch should be available for download.

Thanks

engcom-Hotel avatar Jun 10 '25 14:06 engcom-Hotel