magento-lts
magento-lts copied to clipboard
optimize url_rewrite performance
Description (*)
The goal of this PR is to optimize url rewrite performance for big catalogs.
Use of the 'like' operator in sql queries can generate particularly long execution times when the number of records concerned by the query is large.
Mage_Catalog_Model_Url defines a new field (url_type) for categories and products $rewriteData array.
SQL queries in Mage_Catalog_Helper_Category_Url_Rewrite and Mage_Catalog_Helper_Product_Url_Rewrite have been modified to avoid use of 'like' operator. 'eq' operator is used instead.
catalog_setup has been modified to :
- create 'url_type' new field in core_url_rewrite table and associated index
- update existing core_url_rewrite content to match new behavior
Catalog version now defined as 1.6.0.0.19.1.7
Contribution checklist (*)
- [x] Pull request has a meaningful description of its purpose
- [x] All commits are accompanied by meaningful commit messages
- [x] All automated tests passed successfully (all builds are green)
@greglamy do you have any stats showing the performance improvement? Does your patch also handles the updating the core_rewrite table when url is changed? AT first glips I didn't find a code for it. FYI, there are two issues which might be interesting for you regarding performance of the url_rewrite (not directly related to this PR, just FYI) https://github.com/OpenMage/magento-lts/issues/931 https://github.com/OpenMage/magento-lts/issues/932
Just curious, does it still work without anything after category_id IS NOT NULL? Seems like if you don't even need the additional column/condition.
Thanks for the PR!
Please provide some queries with timing comparisons. The id_path column does have an index on it and LIKE can use this index when the clause is a prefix search ('category/%') so I wouldn't be surprised if the results were similar.
Also note that you could use an ENUM('category','product','other') instead of VARCHAR for a more efficient storage and index.
@greglamy you still interested in this PR? Looks like it would be useful.
@greglamy you still interested in this PR? Looks like it would be useful.
This modification is included in all my projects. But I'm not sure to have enough time to continue working on performance testing. I would need for this a huge catalog of 120K+ test products within 1K+ categories which was the real life context that made me develop this patch to win precious minutes. Can try giving some more time to this PR.
This looks to be a good change. Would love if someone who tried this with a large catalog can share their query times.
Here is a script that you might find handy for this type of need to capture queries from a request. With it you can enable MySQL "general log" in seconds, watch the queries happening in real time, press Ctrl+C to stop monitoring and capture the output to a temporary file for later inspection. https://gist.github.com/colinmollenhour/5646619
@joshua-bn @luigifab @colinmollenhour could you check your comments and decide what to do with this PR?
I vote close if there is no activity for now as it is not proven if it has a benefit and there are some changes requested. A query on indexed varchar using a prefix match should be just about as fast or as fast as an indexed exact string comparison.
It is a good observation, though. A proper compound index might give a better result. For example, this compound index:
UNIQUE KEY `UNQ_CORE_URL_REWRITE_ID_PATH_IS_SYSTEM_STORE_ID` (`id_path`,`is_system`,`store_id`),
was probably meant to address these queries but doesn't work - it should be changed to:
UNIQUE KEY `UNQ_CORE_URL_REWRITE_ID_PATH_IS_SYSTEM_STORE_ID` (`is_system`,`store_id`,`id_path`),
So that it can be used with is_system=1 AND store_id=? AND id_path LIKE .... Currently it cannot since id_path is not the right-most column of the index.
it seems a very good idea so it deserves a "don't forget this pr"