magento2 icon indicating copy to clipboard operation
magento2 copied to clipboard

Performance issue and scalability of popular search term cache

Open mflott opened this issue 4 years ago • 28 comments

Upon upgrading our site to 2.3, we saw a sudden increase in database CPU load. By looking at the processlist we saw a frequent query that took a lot of time:

SELECT DISTINCT COUNT(*) FROM search_queryASmain_table WHERE (main_table.store_id = 1) AND (num_results > 0)

We found that this is part of the 2.3.0 update where they added Popular Search Term Cache (This pull request) The larger the search_query table is, the longer the query takes to complete.

Preconditions (*)

  1. Magento Commerce 2.3.4
  2. ElasticSearch 6.7.0

Steps to reproduce (*)

  1. Make a query in the main search bar.

Expected result (*)

  1. The query is expected to finish within a reasonable amount of time.

Actual result (*)

  1. The query takes a long time to finish, depending on the size of the search_query table.

On our live site we have about 2,7 million search terms, so when doing queries this puts a lot of strain on the database and the queries are very slow. The main part that takes time is the num_results > 0 part - when we tested without this in our local testing environment we got a query duration of 0,7 sec., while if we included it it takes approx. 23 sec.

mflott avatar Apr 03 '20 07:04 mflott

Hi @mflott. Thank you for your report. To help us process this issue please make sure that you provided the following information:

  • [ ] Summary of the issue
  • [ ] Information on your environment
  • [ ] Steps to reproduce
  • [ ] Expected and actual results

Please make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce. To deploy vanilla Magento instance on our environment, please, add a comment to the issue:

@magento give me 2.4-develop instance - upcoming 2.4.x release

For more details, please, review the Magento Contributor Assistant documentation.

@mflott do you confirm that you were able to reproduce the issue on vanilla Magento instance following steps to reproduce?

  • [ ] yes
  • [ ] no

m2-assistant[bot] avatar Apr 03 '20 07:04 m2-assistant[bot]

Hi @mflott do you know what the impact is of removing the AND (num_results > 0) part? Or do you have any other suggestion to improve this? We are seeing the same issues here with larger datasets.

igorwulff avatar May 27 '20 06:05 igorwulff

Hi @igorwulff . I'm not sure if removing the num_results will have any impact, would have to check that part a bit deeper.

Instead of a patch for reverting the PR, we decided to override the execute function in Magento_CatalogSearch/Controller/Result/Index instead and changed it to only use the getNotCacheableResult part, and removed code for the getCacheableResult part. This pretty much changes it back to how it performed before the PR.

mflott avatar May 28 '20 06:05 mflott

Seeing this issue also

convenient avatar Nov 03 '20 12:11 convenient

Just got hit by this too on 2.3.5-p1 when a campaign caused a surge in traffic. Hotfixed by doing what @mflott suggested and forcing the controller code take the getNotCacheableResult path.

mattijv avatar Nov 11 '20 08:11 mattijv

Initial testing suggests that adding this index on the search_query table may resolve the issue. I did not observe much increase in write loads, but this is yet to be tested in production.

<index referenceId="SEARCH_QUERY_STORE_ID_NUM_RESULTS" indexType="btree">
    <column name="store_id"/>
    <column name="num_results"/>
</index>

mattijv avatar Nov 13 '20 07:11 mattijv

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 14 days if no further activity occurs. Is this issue still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? Thank you for your contributions!

stale[bot] avatar Jan 28 '21 21:01 stale[bot]

This issue is still relevant.... a suggestion to fix this was added by @mattijv and should be checked by Magento.

igorwulff avatar Jan 29 '21 08:01 igorwulff

Have same this issue even not using Elastic Search and lower magento 2 version

tuyennn avatar Feb 19 '21 09:02 tuyennn

This issue is still relevant on a 2.4.1-p1 magento website with PHP 7.4 & mysql 8.0.

leonhelmus avatar Mar 29 '21 06:03 leonhelmus

Hi @engcom-Bravo. Thank you for working on this issue. In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down:

  • [ ] 1. Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).

    DetailsIf the issue has a valid description, the label Issue: Format is valid will be added to the issue automatically. Please, edit issue description if needed, until label Issue: Format is valid appears.

  • [ ] 2. Verify that issue has a meaningful description and provides enough information to reproduce the issue. If the report is valid, add Issue: Clear Description label to the issue by yourself.

  • [ ] 3. Add Component: XXXXX label(s) to the ticket, indicating the components it may be related to.

  • [ ] 4. Verify that the issue is reproducible on 2.4-develop branch

    Details- Add the comment @magento give me 2.4-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.4-develop branch, please, add the label Reproduced on 2.4.x.
    - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and stop verification process here!

  • [ ] 5. Add label Issue: Confirmed once verification is complete.

  • [ ] 6. Make sure that automatic system confirms that report has been added to the backlog.

m2-assistant[bot] avatar Jun 10 '21 07:06 m2-assistant[bot]

We have tested it from code side on 2.4-develop branch and confirming this issue.

Even on disabling search suggestions from admin panel and enabling elastic search as per https://devdocs.magento.com/guides/v2.3/release-notes/open-source-2-3-6.html#search , main search bar still hits the MYSQL 'search_query' table.

Due to this search_query table grows and cause performance issue. We need to either index this or use elastic-search for all search related operations.

Admin panel configurations:

image

db.log file after above configurations and search:

image

engcom-Bravo avatar Jun 18 '21 15:06 engcom-Bravo

:white_check_mark: Confirmed by @engcom-Bravo Thank you for verifying the issue. Based on the provided information internal tickets MC-42673 were created

Issue Available: @engcom-Bravo, You will be automatically unassigned. Contributors/Maintainers can claim this issue to continue. To reclaim and continue work, reassign the ticket to yourself.

magento-engcom-team avatar Jun 18 '21 15:06 magento-engcom-team

:x: Cannot export the issue. This GitHub issue is already linked to Jira issue(s): https://jira.corp.magento.com/browse/AC-932

github-jira-sync-bot avatar Aug 25 '21 08:08 github-jira-sync-bot

@mflott The issue has been resolved on 2.4-develop branch.

Thank you!

engcom-Echo avatar Oct 19 '21 03:10 engcom-Echo

So where is the PR for this or patch? I need to apply it.

multipasko avatar Oct 28 '21 20:10 multipasko

@multipasko, it's probably this one: https://github.com/magento/magento2/commit/c90edaa7fbfb8bcd0098de1981e4e022a79a8c13 (the commit has the same ticket number as referenced above)

hostep avatar Nov 02 '21 21:11 hostep

Hello,

We have below query which is continuous recorded as slow query with magento version 2.4.1

SELECT COUNT(*) FROM (SELECT DISTINCT main_table.query_text FROM search_query AS main_table WHERE (main_table.store_id IN (1)) AND (num_results > 0) ORDER BY popularity desc LIMIT 100) AS result WHERE (result.query_text = 'test')

so we have to apply https://github.com/magento/magento2/commit/c90edaa7fbfb8bcd0098de1981e4e022a79a8c13 this one as patch?

Can you please confirm!

Thanks, David Andersson

davidandersson1 avatar Mar 24 '22 11:03 davidandersson1

Still having this issue in 2.4.4-p2. This doesn't seem to be fixable with the way it is currently structured. The reason why is it uses select distinct on a table that is going to have millions of records. Why even cache it?

sdouma avatar Nov 17 '22 18:11 sdouma

I fail to see how c90edaa could possibly fix the performance issue. All it does is add columns to the select query, which would only add MORE lift for mysql to need to compute distinctness. It is not difficult for this table (in my case) to reach 700k entries. Many likely reach far more. Search requests also have a much higher chance for page cache misses. It's no surprise that mysql performance is a concern in production when isTopSearchResult takes 700ms on a beefy local machine.

Removing the DISTINCT operator from the query reduces it from 700ms down to 2ms and produces the same results. The results are always going to be the same too, because the search_query table already has as unique constraint on query_text and store_id, and in all cases, the store_id is being filtered and query_text is being used as a column in the select.

The solution here should be to simply remove the the DISTINCT operator from queries within \Magento\Search\Model\ResourceModel\Query\Collection, because those DISTINCT operators are effectively not doing anything, and only hurting your query performance.

However, if your performance is caused by a a high search term cardinality (lots of unique terms), it's possible that the mysql inserts are too expensive for your project. The solution in THAT case is a bit more involved. You could instead:

  • asynchronously insert search terms into the table in batches to prevent it from slowing down search request times.
  • only insert a fraction of the search terms
  • stop tracking search terms entirely (this will cause you to lose visibility to user search term behaviors on your site)

Note: I am using 2.4.2-p2. ymmv. I recommend comparing your version to 2.4.2-p2 to evaluate whether this solution is applicable to your specific project.

cdcrothers avatar Mar 06 '23 20:03 cdcrothers

@engcom-Echo please can you reopen this so that it can be evaluated again. I agree with @cdcrothers that c90edaa7fbfb8bcd0098de1981e4e022a79a8c13 doesn't look like a fix for this.

fredden avatar Mar 16 '23 12:03 fredden

My magento 2 admin panel is not working at all when I run "show full processlist;" in MySQL, it shows this query running multiple times "SELECT DISTINCT COUNT(*) FROM search_query AS main_table WHERE (main_table.store_id = 1) AND (num_results > 0)" I am using magento 2.3.4 community version.

nilesh-dhumal-fexle avatar Aug 25 '23 05:08 nilesh-dhumal-fexle

2.4.5-p4 still having issues with this useless feature.

jmonrove avatar Aug 28 '23 21:08 jmonrove

Unfortunately, not enough information was provided to create a Jira ticket. Please make sure you added the following label(s): Reproduced on 2.4.x, ^Area:.*

Once all required labels are present, please add Issue: Confirmed label again.

github-jira-sync-bot avatar Aug 29 '23 14:08 github-jira-sync-bot

:x: Cannot export the issue. This GitHub issue is already linked to Jira issue(s): https://jira.corp.adobe.com/browse/AC-932

github-jira-sync-bot avatar Aug 29 '23 14:08 github-jira-sync-bot

I also facing suddenly high CPU usages on Magento 2.4.4-p1. any quick solution for this.

hemendrametawolf avatar Dec 18 '23 04:12 hemendrametawolf

i am using 2.4.6 commerce version. I have 1.4M rows in search_query table, which is bottle neck of keyword search. Now, keyword search takes around 10 seconds to complete one search, which is ridiculous. Finally, I got the solution and shorten the keyword search function to 2-3 seconds. However, I am not sure is it a right approach.

in /vendor/magento/module-search/Model/ResourceModel/Query/Collection.php public function isTopSearchResult(string $term, int $storeId, int $maxCountCacheableSearchTerms):bool change $select->distinct(false);

sixng avatar Jan 03 '24 10:01 sixng

The issue happens to mine 2.4.5-p5 community and MySQL 8.0. It's worse than @sixng case because we have over 40M rows in search_query table. MySQL Bug #99717 Peformance regression of parallel count explains the performance issue to the "Select COUNT(*)". Besides optimizing the query, I would like to recommend Magento to prevent the spam search text as much as possible. That's because I figured out 99% of the rows of search_query is spam search like below. image

qlhu avatar Feb 25 '24 23:02 qlhu

Any update on this issue? Apparently no one is working on this.

tuyennn avatar Mar 31 '24 04:03 tuyennn

2.4.6-p4 problem still present, I don't think anyone cares... I'm just going to set up a cron to empty the table every certain amount of time.

jmonrove avatar Apr 01 '24 14:04 jmonrove