manticoresearch icon indicating copy to clipboard operation
manticoresearch copied to clipboard

Query performance is dependent on the order of keywords in match

Open starinacool opened this issue 1 year ago • 2 comments

Bug Description:

Consider the query:

select id,name  FROM listing2 WHERE match(' @n (  (5 марок unc)  )') ORDER by id asc LIMIT 0,20 OPTION ranker=none, cutoff=500000, max_predicted_time=100; SHOW META;

Meta Result:

+---------------------+------------------------------------------------------------------+
| Variable_name       | Value                                                            |
+---------------------+------------------------------------------------------------------+
| warning             | table listing2: predicted query time exceeded max_predicted_time |
| total               | 20                                                               |
| total_found         | 549                                                              |
| total_relation      | eq                                                               |
| time                | 0.129                                                            |
| local_fetched_docs  | 4879631                                                          |
| local_fetched_hits  | 0                                                                |
| local_fetched_skips | 239401                                                           |
| predicted_time      | 802                                                              |
| keyword[0]          | 5                                                                |
| docs[0]             | 1394039                                                          |
| hits[0]             | 1595430                                                          |
| keyword[1]          | =марок                                                           |
| docs[1]             | 164728                                                           |
| hits[1]             | 170966                                                           |
| keyword[2]          | unc                                                              |
| docs[2]             | 329515                                                           |
| hits[2]             | 332916                                                           |
| keyword[3]          | марка                                                            |
| docs[3]             | 3947825                                                          |
| hits[3]             | 4777475                                                          |
| keyword[4]          | маркий                                                           |
| docs[4]             | 3942153                                                          |
| hits[4]             | 4743454                                                          |
+---------------------+------------------------------------------------------------------+

And this:

select id,name  FROM listing2 WHERE match(' @n (  (unc 5 марок)  )') ORDER by id asc LIMIT 0,20 OPTION ranker=none, cutoff=500000, max_predicted_time=100; SHOW META;

Meta:

+---------------------+--------------+
| Variable_name       | Value        |
+---------------------+--------------+
| total               | 20           |
| total_found         | 1038         |
| total_relation      | eq           |
| time                | 0.019        |
| local_fetched_docs  | 2937496      |
| local_fetched_hits  | 0            |
| local_fetched_skips | 140524       |
| predicted_time      | 475          |
| keyword[0]          | 5            |
| docs[0]             | 1394047      |
| hits[0]             | 1595439      |
| keyword[1]          | =марок       |
| docs[1]             | 164729       |
| hits[1]             | 170967       |
| keyword[2]          | unc          |
| docs[2]             | 329519       |
| hits[2]             | 332920       |
| keyword[3]          | марка        |
| docs[3]             | 3947845      |
| hits[3]             | 4777502      |
| keyword[4]          | маркий       |
| docs[4]             | 3942173      |
| hits[4]             | 4743480      |
+---------------------+--------------+

The difference in queries is only the order of match parameters. But first query is 10x slower then the second and hits the max_predicted_time limiter.

The solution I see: Optimiser should sort keywords in "AND" match blocks in order of number of docs in the index first. Then start searching.

It looks like it is the same issue as in 1641.

Manticore Search Version:

6.3.0

Operating System Version:

Debian 6.1.90-1 (2024-05-03) x86_64 GNU/Linux

Have you tried the latest development version?

None

Internal Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

  • [ ] Implementation completed
  • [ ] Tests developed
  • [ ] Documentation updated
  • [ ] Documentation reviewed
  • [ ] Changelog updated

starinacool avatar Jul 25 '24 14:07 starinacool

#1641

starinacool avatar Jul 25 '24 14:07 starinacool

I can reproduce the issue in the latest dev version on dev2 (in /home/snikolaev/issue-1641):

snikolaev@dev2:~/issue-1641$ rm /tmp/1; for n in `seq 1 1000`; do echo "select id  FROM listing2 WHERE match(' @n (  (5 марок unc)  )');" >> /tmp/1; done; time mysql -P9315 -h0 < /tmp/1 > /dev/null

real	0m28.920s
user	0m0.017s
sys	0m0.065s

snikolaev@dev2:~/issue-1641$ rm /tmp/2; for n in `seq 1 1000`; do echo "select id  FROM listing2 WHERE match(' @n (  (unc 5 марок)  )');" >> /tmp/2; done; time mysql -P9315 -h0 < /tmp/2 > /dev/null

real	0m13.169s
user	0m0.047s
sys	0m0.036s

Query details:

mysql> select id FROM listing2 WHERE match(' @n (  (5 марок unc)  )'); SHOW META;
+-----------+
| id        |
+-----------+
|  38867052 |
| 303005086 |
| 283207653 |
| 303012522 |
| 303006756 |
| 303006314 |
| 303005955 |
| 303005799 |
| 303005768 |
| 303336917 |
| 303098944 |
| 303092288 |
| 303092800 |
| 303094309 |
| 230253739 |
| 293791820 |
| 267009727 |
| 275710727 |
| 293828205 |
| 298788392 |
+-----------+
20 rows in set (0.03 sec)
--- 20 out of 700 results in 30ms ---

+----------------+--------------+
| Variable_name  | Value        |
+----------------+--------------+
| total          | 20           |
| total_found    | 700          |
| total_relation | eq           |
| time           | 0.030        |
| keyword[0]     | 5            |
| docs[0]        | 1286113      |
| hits[0]        | 1464194      |
| keyword[1]     | =марок       |
| docs[1]        | 139832       |
| hits[1]        | 145181       |
| keyword[2]     | unc          |
| docs[2]        | 263203       |
| hits[2]        | 264194       |
| keyword[3]     | марка        |
| docs[3]        | 3622880      |
| hits[3]        | 4319542      |
| keyword[4]     | маркий       |
| docs[4]        | 3617138      |
| hits[4]        | 4287922      |
+----------------+--------------+
19 rows in set (0.00 sec)

mysql> select id FROM listing2 WHERE match(' @n (  (unc 5 марок)  )'); SHOW META;
+-----------+
| id        |
+-----------+
| 283207653 |
| 303012522 |
| 303006756 |
| 303105314 |
| 303105310 |
| 303105309 |
| 303105308 |
| 303105306 |
| 303105305 |
| 303105302 |
| 303105301 |
| 303105299 |
| 303105298 |
| 303105297 |
| 303105295 |
| 303105294 |
| 303105292 |
| 303105291 |
| 303644091 |
| 303644090 |
+-----------+
20 rows in set (0.02 sec)
--- 20 out of 700 results in 16ms ---

+----------------+--------------+
| Variable_name  | Value        |
+----------------+--------------+
| total          | 20           |
| total_found    | 700          |
| total_relation | eq           |
| time           | 0.016        |
| keyword[0]     | 5            |
| docs[0]        | 1286113      |
| hits[0]        | 1464194      |
| keyword[1]     | =марок       |
| docs[1]        | 139832       |
| hits[1]        | 145181       |
| keyword[2]     | unc          |
| docs[2]        | 263203       |
| hits[2]        | 264194       |
| keyword[3]     | марка        |
| docs[3]        | 3622880      |
| hits[3]        | 4319542      |
| keyword[4]     | маркий       |
| docs[4]        | 3617138      |
| hits[4]        | 4287922      |
+----------------+--------------+
19 rows in set (0.00 sec)

sanikolaev avatar Aug 07 '24 11:08 sanikolaev