magento2
magento2 copied to clipboard
Performance issue and scalability of popular search term cache
Upon upgrading our site to 2.3, we saw a sudden increase in database CPU load. By looking at the processlist we saw a frequent query that took a lot of time:
SELECT DISTINCT COUNT(*) FROM
search_queryAS
main_table WHERE (main_table.store_id = 1) AND (num_results > 0)
We found that this is part of the 2.3.0 update where they added Popular Search Term Cache (This pull request) The larger the search_query table is, the longer the query takes to complete.
Preconditions (*)
- Magento Commerce 2.3.4
- ElasticSearch 6.7.0
Steps to reproduce (*)
- Make a query in the main search bar.
Expected result (*)
- The query is expected to finish within a reasonable amount of time.
Actual result (*)
- The query takes a long time to finish, depending on the size of the search_query table.
On our live site we have about 2,7 million search terms, so when doing queries this puts a lot of strain on the database and the queries are very slow. The main part that takes time is the num_results > 0 part - when we tested without this in our local testing environment we got a query duration of 0,7 sec., while if we included it it takes approx. 23 sec.
Hi @mflott. Thank you for your report. To help us process this issue please make sure that you provided the following information:
- [ ] Summary of the issue
- [ ] Information on your environment
- [ ] Steps to reproduce
- [ ] Expected and actual results
Please make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce. To deploy vanilla Magento instance on our environment, please, add a comment to the issue:
@magento give me 2.4-develop instance
- upcoming 2.4.x release
For more details, please, review the Magento Contributor Assistant documentation.
@mflott do you confirm that you were able to reproduce the issue on vanilla Magento instance following steps to reproduce?
- [ ] yes
- [ ] no
- Join Magento Community Engineering Slack and ask your questions in #github channel.
Hi @mflott do you know what the impact is of removing the AND (num_results > 0)
part? Or do you have any other suggestion to improve this? We are seeing the same issues here with larger datasets.
Hi @igorwulff . I'm not sure if removing the num_results will have any impact, would have to check that part a bit deeper.
Instead of a patch for reverting the PR, we decided to override the execute function in Magento_CatalogSearch/Controller/Result/Index
instead and changed it to only use the getNotCacheableResult part, and removed code for the getCacheableResult part. This pretty much changes it back to how it performed before the PR.
Seeing this issue also
Just got hit by this too on 2.3.5-p1 when a campaign caused a surge in traffic. Hotfixed by doing what @mflott suggested and forcing the controller code take the getNotCacheableResult
path.
Initial testing suggests that adding this index on the search_query
table may resolve the issue. I did not observe much increase in write loads, but this is yet to be tested in production.
<index referenceId="SEARCH_QUERY_STORE_ID_NUM_RESULTS" indexType="btree">
<column name="store_id"/>
<column name="num_results"/>
</index>
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 14 days if no further activity occurs. Is this issue still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? Thank you for your contributions!
This issue is still relevant.... a suggestion to fix this was added by @mattijv and should be checked by Magento.
Have same this issue even not using Elastic Search and lower magento 2 version
This issue is still relevant on a 2.4.1-p1 magento website with PHP 7.4 & mysql 8.0.
Hi @engcom-Bravo. Thank you for working on this issue. In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down:
-
[ ] 1. Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).
Details
If the issue has a valid description, the labelIssue: Format is valid
will be added to the issue automatically. Please, edit issue description if needed, until labelIssue: Format is valid
appears. -
[ ] 2. Verify that issue has a meaningful description and provides enough information to reproduce the issue. If the report is valid, add
Issue: Clear Description
label to the issue by yourself. -
[ ] 3. Add
Component: XXXXX
label(s) to the ticket, indicating the components it may be related to. -
[ ] 4. Verify that the issue is reproducible on
2.4-develop
branchDetails
- Add the comment@magento give me 2.4-develop instance
to deploy test instance on Magento infrastructure.
- If the issue is reproducible on2.4-develop
branch, please, add the labelReproduced on 2.4.x
.
- If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and stop verification process here! -
[ ] 5. Add label
Issue: Confirmed
once verification is complete. -
[ ] 6. Make sure that automatic system confirms that report has been added to the backlog.
We have tested it from code side on 2.4-develop branch and confirming this issue.
Even on disabling search suggestions from admin panel and enabling elastic search as per https://devdocs.magento.com/guides/v2.3/release-notes/open-source-2-3-6.html#search , main search bar still hits the MYSQL 'search_query' table.
Due to this search_query table grows and cause performance issue. We need to either index this or use elastic-search for all search related operations.
Admin panel configurations:
db.log file after above configurations and search:
:white_check_mark: Confirmed by @engcom-Bravo
Thank you for verifying the issue. Based on the provided information internal tickets MC-42673
were created
Issue Available: @engcom-Bravo, You will be automatically unassigned. Contributors/Maintainers can claim this issue to continue. To reclaim and continue work, reassign the ticket to yourself.
:x: Cannot export the issue. This GitHub issue is already linked to Jira issue(s): https://jira.corp.magento.com/browse/AC-932
@mflott The issue has been resolved on 2.4-develop branch.
Thank you!
So where is the PR for this or patch? I need to apply it.
@multipasko, it's probably this one: https://github.com/magento/magento2/commit/c90edaa7fbfb8bcd0098de1981e4e022a79a8c13 (the commit has the same ticket number as referenced above)
Hello,
We have below query which is continuous recorded as slow query with magento version 2.4.1
SELECT COUNT(*) FROM (SELECT DISTINCT main_table
.query_text
FROM search_query
AS main_table
WHERE (main_table.store_id IN (1)) AND (num_results > 0) ORDER BY popularity
desc
LIMIT 100) AS result
WHERE (result.query_text = 'test')
so we have to apply https://github.com/magento/magento2/commit/c90edaa7fbfb8bcd0098de1981e4e022a79a8c13 this one as patch?
Can you please confirm!
Thanks, David Andersson
Still having this issue in 2.4.4-p2. This doesn't seem to be fixable with the way it is currently structured. The reason why is it uses select distinct on a table that is going to have millions of records. Why even cache it?
I fail to see how c90edaa could possibly fix the performance issue. All it does is add columns to the select query, which would only add MORE lift for mysql to need to compute distinctness. It is not difficult for this table (in my case) to reach 700k entries. Many likely reach far more. Search requests also have a much higher chance for page cache misses. It's no surprise that mysql performance is a concern in production when isTopSearchResult
takes 700ms on a beefy local machine.
Removing the DISTINCT
operator from the query reduces it from 700ms down to 2ms and produces the same results. The results are always going to be the same too, because the search_query table already has as unique constraint on query_text
and store_id
, and in all cases, the store_id
is being filtered and query_text
is being used as a column in the select.
The solution here should be to simply remove the the DISTINCT
operator from queries within \Magento\Search\Model\ResourceModel\Query\Collection
, because those DISTINCT
operators are effectively not doing anything, and only hurting your query performance.
However, if your performance is caused by a a high search term cardinality (lots of unique terms), it's possible that the mysql inserts are too expensive for your project. The solution in THAT case is a bit more involved. You could instead:
- asynchronously insert search terms into the table in batches to prevent it from slowing down search request times.
- only insert a fraction of the search terms
- stop tracking search terms entirely (this will cause you to lose visibility to user search term behaviors on your site)
Note: I am using 2.4.2-p2. ymmv. I recommend comparing your version to 2.4.2-p2 to evaluate whether this solution is applicable to your specific project.
@engcom-Echo please can you reopen this so that it can be evaluated again. I agree with @cdcrothers that c90edaa7fbfb8bcd0098de1981e4e022a79a8c13 doesn't look like a fix for this.
My magento 2 admin panel is not working at all
when I run "show full processlist;" in MySQL,
it shows this query running multiple times "SELECT DISTINCT COUNT(*) FROM search_query
AS main_table
WHERE (main_table.store_id = 1) AND (num_results > 0)"
I am using magento 2.3.4 community version.
2.4.5-p4 still having issues with this useless feature.
Unfortunately, not enough information was provided to create a Jira ticket. Please make sure you added the following label(s): Reproduced on 2.4.x
, ^Area:.*
Once all required labels are present, please add Issue: Confirmed
label again.
:x: Cannot export the issue. This GitHub issue is already linked to Jira issue(s): https://jira.corp.adobe.com/browse/AC-932
I also facing suddenly high CPU usages on Magento 2.4.4-p1. any quick solution for this.
i am using 2.4.6 commerce version. I have 1.4M rows in search_query table, which is bottle neck of keyword search. Now, keyword search takes around 10 seconds to complete one search, which is ridiculous. Finally, I got the solution and shorten the keyword search function to 2-3 seconds. However, I am not sure is it a right approach.
in /vendor/magento/module-search/Model/ResourceModel/Query/Collection.php public function isTopSearchResult(string $term, int $storeId, int $maxCountCacheableSearchTerms):bool change $select->distinct(false);
The issue happens to mine 2.4.5-p5 community and MySQL 8.0. It's worse than @sixng case because we have over 40M rows in search_query table. MySQL Bug #99717 Peformance regression of parallel count explains the performance issue to the "Select COUNT(*)".
Besides optimizing the query, I would like to recommend Magento to prevent the spam search text as much as possible. That's because I figured out 99% of the rows of search_query is spam search like below.
Any update on this issue? Apparently no one is working on this.
2.4.6-p4 problem still present, I don't think anyone cares... I'm just going to set up a cron to empty the table every certain amount of time.