magento2 icon indicating copy to clipboard operation
magento2 copied to clipboard

On-schedule indexers stuck in "working" status

Open Tomasito665 opened this issue 2 years ago • 26 comments

Preconditions and environment

Description

For "on-schedule" indexes, the indexer_update_all_views cron job runs every minute to work through the backlog of changed entities and update their corresponding indexes accordingly. In Adobe Commerce cloud environments, the system sometimes terminates this cron job when running low on memory. If this happens during an index update, the index gets indefinitely stuck into the "working" status. In those cases, getting the index unstuck requires manual action. The following diagram provides an overview.

image

Magento version

2.4.5

Steps to reproduce

Idea

The easiest way to reproduce this issue is to make the update for a particular index artificially slow by adding a sleep. Then, we can manually run the job and kill it from another terminal while it's running to freeze it forever. We can do this with any indexer. In the steps below, we use the product price indexer.

Steps

  1. Disable the automatic execution of cron jobs. We will run it manually for more control.
  2. Install n98-magerun2. We will use this tool to run the indexer_update_all_views cron job in isolation.
  3. Set the catalog_product_price indexer mode to schedule.
  4. Make Indexer\Product\Price::execute artificially slow by adding sleep(300);.
  5. Change the price of any product to add it to the backlog of price updates.
  6. Run bin/magento indexer:status catalog_product_price — it should show "x in backlog".
  7. Run n98-magerun2 sys:cron:run indexer_update_all_views to run the cron job and remember its PID.
  8. Within 300 seconds, from another terminal, kill the above process with kill $PID.
  9. Remove the sleep(300); and re-run steps 5 and 6 to simulate a non-slow, successful index update.

Expected result

The indexer updates the product price index successfully to include the new price.

Actual result

The product price indexer gets frozen, which keeps it from processing any further price changes.

image

Additional information

Logs from "how to reproduce" steps 7 and 8.

# Terminal 2
app@7ab8e61b7445:~/html$ ps axuf
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
app        466  0.1  0.0   4100  3280 pts/3    Ss   13:05   0:00 bash
app        473  0.0  0.0   6700  2948 pts/3    R+   13:05   0:00  \_ ps axuf
app        451  0.0  0.0   4100  3448 pts/2    Ss   13:04   0:00 bash
app        462  1.7  0.9 223156 153088 pts/2   S+   13:04   0:00  \_ php vendor/n98/magerun2-dist/n98-magerun2 sys:cron:run indexer_update_all_views
app        406  0.2  0.0  20508 14588 pts/1    Ss+  12:59   0:00 mysql -hdb -umagento -px xxxxx magento
app         45  0.0  0.0   4100  3352 pts/0    Ss+  12:43   0:00 bash
app         24  0.0  0.0   2420   520 ?        Ss   12:43   0:00 sh /var/www/.composer-global/vendor/bin/cache-clean.js --quiet --watch
app         34  0.3  0.4 649828 78948 ?        Sl   12:43   0:05  \_ node /var/www/.composer-global/vendor/mage2tv/magento-cache-clean/bin/cache-clean.js --quiet --watch
app          1  0.0  0.2 236276 36008 ?        Ss   12:43   0:00 php-fpm: master process (/usr/local/etc/php-fpm.conf)
app        399  0.7  0.8 259592 133236 ?       S    12:58   0:03 php-fpm: pool www
app        400  3.1  0.9 282456 156260 ?       S    12:58   0:13 php-fpm: pool www
app        401  0.5  0.5 253720 89976 ?        S    12:59   0:02 php-fpm: pool www
app        402  0.5  0.7 337404 126924 ?       S    12:59   0:01 php-fpm: pool www
app        403  0.5  0.7 327488 127156 ?       S    12:59   0:02 php-fpm: pool www
app        404  0.5  0.8 262552 136764 ?       S    12:59   0:01 php-fpm: pool www
app@7ab8e61b7445:~/html$ kill 462
app@7ab8e61b7445:~/html$ 

# Terminal 1
app@7ab8e61b7445:~/html$ vendor/n98/magerun2-dist/n98-magerun2 sys:cron:run indexer_update_all_views
Run Magento\Indexer\Cron\UpdateMview::execute Terminated

Release note

No response

Triage and priority

  • [ ] Severity: S0 - Affects critical data or functionality and leaves users without workaround.
  • [X] Severity: S1 - Affects critical data or functionality and forces users to employ a workaround.
  • [ ] Severity: S2 - Affects non-critical data or functionality and forces users to employ a workaround.
  • [ ] Severity: S3 - Affects non-critical data or functionality and does not force users to employ a workaround.
  • [ ] Severity: S4 - Affects aesthetics, professional look and feel, “quality” or “usability”.

Tomasito665 avatar Jan 10 '23 17:01 Tomasito665

Hi @Tomasito665. Thank you for your report. To speed up processing of this issue, make sure that you provided the following information:

  • Summary of the issue
  • Information on your environment
  • Steps to reproduce
  • Expected and actual results

Make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce. To deploy vanilla Magento instance on our environment, Add a comment to the issue:

@magento give me 2.4-develop instance - upcoming 2.4.x release

For more details, review the Magento Contributor Assistant documentation.

Add a comment to assign the issue: @magento I am working on this

To learn more about issue processing workflow, refer to the Code Contributions.


:warning: According to the Magento Contribution requirements, all issues must go through the Community Contributions Triage process. Community Contributions Triage is a public meeting.

:clock10: You can find the schedule on the Magento Community Calendar page.

:telephone_receiver: The triage of issues happens in the queue order. If you want to speed up the delivery of your contribution, join the Community Contributions Triage session to discuss the appropriate ticket.

:pencil2: Feel free to post questions/proposals/feedback related to the Community Contributions Triage process to the corresponding Slack Channel

m2-assistant[bot] avatar Jan 10 '23 17:01 m2-assistant[bot]

Hi @engcom-Dash. Thank you for working on this issue. In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down:

    1. Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).
      DetailsIf the issue has a valid description, the label Issue: Format is valid will be added to the issue automatically. Please, edit issue description if needed, until label Issue: Format is valid appears.
    1. Verify that issue has a meaningful description and provides enough information to reproduce the issue. If the report is valid, add Issue: Clear Description label to the issue by yourself.
    1. Add Component: XXXXX label(s) to the ticket, indicating the components it may be related to.
    1. Verify that the issue is reproducible on 2.4-develop branch
      Details- Add the comment @magento give me 2.4-develop instance to deploy test instance on Magento infrastructure.
      - If the issue is reproducible on 2.4-develop branch, please, add the label Reproduced on 2.4.x.
      - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and stop verification process here!

m2-assistant[bot] avatar Jan 11 '23 00:01 m2-assistant[bot]

@magento give me 2.4-develop instance

Tomasito665 avatar Jan 13 '23 16:01 Tomasito665

Hi @Tomasito665. Thank you for your request. I'm working on Magento instance for you.

Hi @Tomasito665, here is your Magento Instance: https://e095d94e9dfca546500fc7a300209aa2.instances.magento-community.engineering Admin access: https://e095d94e9dfca546500fc7a300209aa2.instances.magento-community.engineering/admin_5943 Login: 3b662afd Password: c205e87eb57f

@magento give me 2.4-develop instance

I tried reproducing the issue with this instance. However, reproducing the bug with the steps described in the description of this issue requires access to the machine, which the machines spun up with m2-assistant do not provide. It should, however, be possible to reproduce the bug locally or in any other environment as long as you have SSH access. I reproduced the bug on my local machine with versions 2.4.4 and 2.4.5 of Magento.

Tomasito665 avatar Jan 13 '23 17:01 Tomasito665

Hi @Tomasito665. Thank you for your request. I'm working on Magento instance for you.

Hi @Tomasito665, here is your Magento Instance: https://e095d94e9dfca546500fc7a300209aa2.instances.magento-community.engineering Admin access: https://e095d94e9dfca546500fc7a300209aa2.instances.magento-community.engineering/admin_12f0 Login: 752a361f Password: 35a014e9e2c6

Hi @engcom-Hotel. Thank you for working on this issue. In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down:

  • [ ] 1. Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).

    DetailsIf the issue has a valid description, the label Issue: Format is valid will be added to the issue automatically. Please, edit issue description if needed, until label Issue: Format is valid appears.

  • [ ] 2. Verify that issue has a meaningful description and provides enough information to reproduce the issue. If the report is valid, add Issue: Clear Description label to the issue by yourself.

  • [ ] 3. Add Component: XXXXX label(s) to the ticket, indicating the components it may be related to.

  • [ ] 4. Verify that the issue is reproducible on 2.4-develop branch

    Details- Add the comment @magento give me 2.4-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.4-develop branch, please, add the label Reproduced on 2.4.x.
    - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and stop verification process here!

  • [ ] 5. Add label Issue: Confirmed once verification is complete.

  • [ ] 6. Make sure that automatic system confirms that report has been added to the backlog.

m2-assistant[bot] avatar Jan 16 '23 10:01 m2-assistant[bot]

Hello @Tomasito665,

Thanks for the report and collaboration!

We have tried to reproduce the issue in Magento 2.4-develop instance and the issue is reproducible for us by exact mentioned steps.

Please refer to the screenshots for reference:

Admin Panel

image

Error in Terminal

image

Hence confirming the issue.

Thanks

engcom-Hotel avatar Jan 16 '23 11:01 engcom-Hotel

:white_check_mark: Jira issue https://jira.corp.adobe.com/browse/AC-7685 is successfully created for this GitHub issue.

github-jira-sync-bot avatar Jan 16 '23 11:01 github-jira-sync-bot

:white_check_mark: Confirmed by @engcom-Hotel. Thank you for verifying the issue.
Issue Available: @engcom-Hotel, You will be automatically unassigned. Contributors/Maintainers can claim this issue to continue. To reclaim and continue work, reassign the ticket to yourself.

m2-assistant[bot] avatar Jan 16 '23 11:01 m2-assistant[bot]

@Tomasito665: out of curiosity, have you already tried the indexer config setting use_application_lock? See the documentation describing it.

Basically:

  • without the setting, it will use the database to keep the state of an indexer, if a process crashes halfway through its execution, the database never gets the correct status
  • with the setting enabled, it uses the lockmanager of magento, if the process crashes, the lock will be freed and magento will know that it can try to reindex the indexer again on the next cron execution

Maybe this helps in your case?

hostep avatar Jan 19 '23 21:01 hostep

Unluckily, at least in 2.4.4-p1, using use_application_lock as described here does not seem to solve the issue:

Captura de pantalla 2023-03-07 a las 15 32 57 Captura de pantalla 2023-03-07 a las 15 33 24 Captura de pantalla 2023-03-07 a las 15 33 36

adrian-martinez-vdshop avatar Mar 07 '23 14:03 adrian-martinez-vdshop

@magento I am working on this

nidhigupta13-ey avatar Aug 22 '23 10:08 nidhigupta13-ey

Hello! Any update on this? I have this issue on my 2.4.6-p2.

davideleonelli99 avatar Sep 21 '23 13:09 davideleonelli99

Issue on 2.4.5-p4 cloud Any idea how to use_application_lock set this on .magento.env.yaml file?

alexandrosk avatar Sep 26 '23 13:09 alexandrosk

Hi @alexandrosk. I suggest you to try this Adobe Quality Patch https://docs.mktossl.com/docs/commerce-knowledge-base/kb/support-tools/patches/v1-1-33/acsd-51431-indexer-status-is-working.html?lang=en. I tested it on my staging environment and it seems to work, I will deploy it to production next week.

davideleonelli99 avatar Sep 29 '23 08:09 davideleonelli99

@davideleonelli99 how it did?

kanevbg avatar Nov 28 '23 12:11 kanevbg

here

we are also facing same issue in Magento ver. 2.4.6-p3

magento360 avatar Jan 08 '24 06:01 magento360

Same issue here, need to run indexer:reindex every 10-15 minutes to get the indexes updated.

santerref avatar Jan 15 '24 09:01 santerref

Hello @santerref,

Have you tried the use_application_lock approach mentioned here https://github.com/magento/magento2/issues/36724#issuecomment-1397652690? This might resolve your issue.

You can go through with the below devdocs URL for the same:

https://developer.adobe.com/commerce/php/development/components/indexing/#using-application-lock-mode-for-reindex-processes

Thanks

engcom-Hotel avatar Jan 16 '24 08:01 engcom-Hotel

Hello @santerref,

Have you tried the use_application_lock approach mentioned here #36724 (comment)? This might resolve your issue.

You can go through with the below devdocs URL for the same:

https://developer.adobe.com/commerce/php/development/components/indexing/#using-application-lock-mode-for-reindex-processes

Thanks

Yes, we already use this setting and the issue is still there.

santerref avatar Feb 02 '24 12:02 santerref

Hello @santerref,

We have tried to reproduce the issue in the latest development branch and the issue is reproducible for us with the mentioned steps. But using the use_application_lock approach is solving the issue.

Please try to reproduce the issue in the latest development branch and let us know if the issue is reproducible for you.

Thanks

engcom-Hotel avatar Feb 05 '24 12:02 engcom-Hotel

hello @engcom-Hotel how to set the use_application_lock on Adobe Cloud? I cannot see that on the list deploy variables here https://experienceleague.adobe.com/docs/commerce-cloud-service/user-guide/configure/env/stage/variables-deploy.html?lang=en

namluu avatar Feb 28 '24 16:02 namluu

Hi something like that ?

variables:
    env:
        CONFIG__DEFAULT__INDEXER__USE_APPLICATION_LOCK: true

yasser-moudouani avatar Feb 29 '24 15:02 yasser-moudouani

@engcom-Hotel please provide a documentation in https://experienceleague.adobe.com/docs/commerce-cloud-service/user-guide/configure/env/stage/variables-deploy.html?lang=en how to set the use_application_lock in the env.php via .magento.env.yaml or give another way of adding this setting.

I got the same issue in version 2.4.6-p3

hannes011 avatar Mar 13 '24 14:03 hannes011

Hello @hannes011,

Please follow the below devdocs to configure use_application_lock:

https://developer.adobe.com/commerce/php/development/components/indexing/#using-application-lock-mode-for-reindex-processes

engcom-Hotel avatar Mar 18 '24 11:03 engcom-Hotel

@engcom-Hotel sorry, I was not clear enough - I am using Magento Cloud and the env.php is not part of the repository so it can not be adjusted. There is no way of changing the env.php for that flag on Cloud mentioned in your documentation link since we cannot adjust the env.php! (I can adjust it but there is no documentation available which guarantees that it won't be overridden by any deployment mechanism of Magento Cloud). That's why I'm asking for a Magento Cloud conform way (e.g. via the .magento.env.yaml) to change that file env.php

hannes011 avatar Mar 25 '24 11:03 hannes011

Somebody on Slack recently asked the same question and after a while he came back with this reply he got from Cloud support:

I just got an answer from support and they told me to set it like that : env:CONFIG__DEFAULT__INDEXER__USE_APPLICATION_LOCK So i guess it is something that can also be set in core_config_data

Maybe that works? If it doesn't, I would strongly suggest you ask Cloud support this same question. The thread here is about Magento Open Source and few people here know how Magento Cloud works...

hostep avatar Mar 25 '24 13:03 hostep