Query performance is dependent on the order of keywords in match
Bug Description:
Consider the query:
select id,name FROM listing2 WHERE match(' @n ( (5 марок unc) )') ORDER by id asc LIMIT 0,20 OPTION ranker=none, cutoff=500000, max_predicted_time=100; SHOW META;
Meta Result:
+---------------------+------------------------------------------------------------------+
| Variable_name | Value |
+---------------------+------------------------------------------------------------------+
| warning | table listing2: predicted query time exceeded max_predicted_time |
| total | 20 |
| total_found | 549 |
| total_relation | eq |
| time | 0.129 |
| local_fetched_docs | 4879631 |
| local_fetched_hits | 0 |
| local_fetched_skips | 239401 |
| predicted_time | 802 |
| keyword[0] | 5 |
| docs[0] | 1394039 |
| hits[0] | 1595430 |
| keyword[1] | =марок |
| docs[1] | 164728 |
| hits[1] | 170966 |
| keyword[2] | unc |
| docs[2] | 329515 |
| hits[2] | 332916 |
| keyword[3] | марка |
| docs[3] | 3947825 |
| hits[3] | 4777475 |
| keyword[4] | маркий |
| docs[4] | 3942153 |
| hits[4] | 4743454 |
+---------------------+------------------------------------------------------------------+
And this:
select id,name FROM listing2 WHERE match(' @n ( (unc 5 марок) )') ORDER by id asc LIMIT 0,20 OPTION ranker=none, cutoff=500000, max_predicted_time=100; SHOW META;
Meta:
+---------------------+--------------+
| Variable_name | Value |
+---------------------+--------------+
| total | 20 |
| total_found | 1038 |
| total_relation | eq |
| time | 0.019 |
| local_fetched_docs | 2937496 |
| local_fetched_hits | 0 |
| local_fetched_skips | 140524 |
| predicted_time | 475 |
| keyword[0] | 5 |
| docs[0] | 1394047 |
| hits[0] | 1595439 |
| keyword[1] | =марок |
| docs[1] | 164729 |
| hits[1] | 170967 |
| keyword[2] | unc |
| docs[2] | 329519 |
| hits[2] | 332920 |
| keyword[3] | марка |
| docs[3] | 3947845 |
| hits[3] | 4777502 |
| keyword[4] | маркий |
| docs[4] | 3942173 |
| hits[4] | 4743480 |
+---------------------+--------------+
The difference in queries is only the order of match parameters. But first query is 10x slower then the second and hits the max_predicted_time limiter.
The solution I see: Optimiser should sort keywords in "AND" match blocks in order of number of docs in the index first. Then start searching.
It looks like it is the same issue as in 1641.
Manticore Search Version:
6.3.0
Operating System Version:
Debian 6.1.90-1 (2024-05-03) x86_64 GNU/Linux
Have you tried the latest development version?
None
Internal Checklist:
To be completed by the assignee. Check off tasks that have been completed or are not applicable.
- [ ] Implementation completed
- [ ] Tests developed
- [ ] Documentation updated
- [ ] Documentation reviewed
- [ ] Changelog updated
#1641
I can reproduce the issue in the latest dev version on dev2 (in /home/snikolaev/issue-1641):
snikolaev@dev2:~/issue-1641$ rm /tmp/1; for n in `seq 1 1000`; do echo "select id FROM listing2 WHERE match(' @n ( (5 марок unc) )');" >> /tmp/1; done; time mysql -P9315 -h0 < /tmp/1 > /dev/null
real 0m28.920s
user 0m0.017s
sys 0m0.065s
snikolaev@dev2:~/issue-1641$ rm /tmp/2; for n in `seq 1 1000`; do echo "select id FROM listing2 WHERE match(' @n ( (unc 5 марок) )');" >> /tmp/2; done; time mysql -P9315 -h0 < /tmp/2 > /dev/null
real 0m13.169s
user 0m0.047s
sys 0m0.036s
Query details:
mysql> select id FROM listing2 WHERE match(' @n ( (5 марок unc) )'); SHOW META;
+-----------+
| id |
+-----------+
| 38867052 |
| 303005086 |
| 283207653 |
| 303012522 |
| 303006756 |
| 303006314 |
| 303005955 |
| 303005799 |
| 303005768 |
| 303336917 |
| 303098944 |
| 303092288 |
| 303092800 |
| 303094309 |
| 230253739 |
| 293791820 |
| 267009727 |
| 275710727 |
| 293828205 |
| 298788392 |
+-----------+
20 rows in set (0.03 sec)
--- 20 out of 700 results in 30ms ---
+----------------+--------------+
| Variable_name | Value |
+----------------+--------------+
| total | 20 |
| total_found | 700 |
| total_relation | eq |
| time | 0.030 |
| keyword[0] | 5 |
| docs[0] | 1286113 |
| hits[0] | 1464194 |
| keyword[1] | =марок |
| docs[1] | 139832 |
| hits[1] | 145181 |
| keyword[2] | unc |
| docs[2] | 263203 |
| hits[2] | 264194 |
| keyword[3] | марка |
| docs[3] | 3622880 |
| hits[3] | 4319542 |
| keyword[4] | маркий |
| docs[4] | 3617138 |
| hits[4] | 4287922 |
+----------------+--------------+
19 rows in set (0.00 sec)
mysql> select id FROM listing2 WHERE match(' @n ( (unc 5 марок) )'); SHOW META;
+-----------+
| id |
+-----------+
| 283207653 |
| 303012522 |
| 303006756 |
| 303105314 |
| 303105310 |
| 303105309 |
| 303105308 |
| 303105306 |
| 303105305 |
| 303105302 |
| 303105301 |
| 303105299 |
| 303105298 |
| 303105297 |
| 303105295 |
| 303105294 |
| 303105292 |
| 303105291 |
| 303644091 |
| 303644090 |
+-----------+
20 rows in set (0.02 sec)
--- 20 out of 700 results in 16ms ---
+----------------+--------------+
| Variable_name | Value |
+----------------+--------------+
| total | 20 |
| total_found | 700 |
| total_relation | eq |
| time | 0.016 |
| keyword[0] | 5 |
| docs[0] | 1286113 |
| hits[0] | 1464194 |
| keyword[1] | =марок |
| docs[1] | 139832 |
| hits[1] | 145181 |
| keyword[2] | unc |
| docs[2] | 263203 |
| hits[2] | 264194 |
| keyword[3] | марка |
| docs[3] | 3622880 |
| hits[3] | 4319542 |
| keyword[4] | маркий |
| docs[4] | 3617138 |
| hits[4] | 4287922 |
+----------------+--------------+
19 rows in set (0.00 sec)