manticoresearch
manticoresearch copied to clipboard
Percolate index does not search properly by exact phrase query when stemming enabled
Describe the bug
I use the latest 4.2.0 version.
If I enable stemming mode morphology='stem_en' in combination with e.g. index_exact_words='1' for the percolate index, it does not search properly by exact phrase query.
single parameter morphology='stem_en' searches well, also as index_exact_words='1' without stemming.
To Reproduce
Steps to reproduce the behavior:
Example (not correct behaviour):
1.1
CREATE TABLE jjj (
title text
) type='pq' index_exact_words='1' morphology='stem_en'
2.1
insert into jjj (id, query) values (1, 'finance business partner');
insert into jjj (id, query) values (2, '="finance business partner"');
insert into jjj (id, query) values (3, '"finance business partner"');
3.1
CALL PQ('jjj', '[{"lid": 123, "title": "finance business partner"}]', 'lid' AS docs_id, 1 AS docs);`
------ -----------
| id | documents |
------ -----------
| 1 | 123 |
| 3 | 123 |
------ -----------
2 rows in set (0.00 sec)
But it should give me three IDs: 1, 2 and 3.
Other example (correct behaviour):
1.2
CREATE TABLE eee (
title text
) type='pq' morphology='stem_en'
2.2
insert into eee (id, query) values (1, 'finance business partner');
insert into eee (id, query) values (2, '="finance business partner"');
insert into eee (id, query) values (3, '"finance business partner"');
3.2
CALL PQ('eee', '[{"lid": 123, "title": "finance business partner"}]', 'lid' AS docs_id, 1 AS docs);
------ -----------
| id | documents |
------ -----------
| 1 | 123 |
| 2 | 123 |
| 3 | 123 |
------ -----------
3 rows in set (0.00 sec)
Other example (correct behaviour):
1.3
CREATE TABLE lll (
title text
) type='pq' index_exact_words='1';
2.3
insert into lll (id, query) values (1, 'finance business partner');
insert into lll (id, query) values (2, '="finance business partner"');
insert into lll (id, query) values (3, '"finance business partner"');
3.3
CALL PQ('lll', '[{"lid": 123, "title": "finance business partner"}]', 'lid' AS docs_id, 1 AS docs);
------ -----------
| id | documents |
------ -----------
| 1 | 123 |
| 2 | 123 |
| 3 | 123 |
------ -----------
3 rows in set (0.00 sec)
Expected behavior
I expect all three percolate index setups to give the same results.
Describe the environment:
Manticore 4.2.0 15e927b28@211223 release (columnar 1.11.4 327b3d4@211223)
Linux manticore-0.local 5.15.0-2-amd64 #1 SMP Debian 5.15.5-2 (2021-12-18) x86_64 GNU/Linux
Messages from log files:
No specific messages related to this
Additional context
N/A
Thank you for the good minimal reproducible example.
OK, we avoided this bug in our functionality but anyway would be good to have a fix for it
Sure, we'll fix it eventually, it's just that there are more important issues since the functionality which causes this one is not commonly used: percolate + stemming + exact form modifier + phrase operator. So it's going to backlog until we have time to work on it. Pull-requests are very welcome!
Still exists on Server version: 5.0.2 348514c86@220530 dev git branch HEAD (no branch) Linux manticore-0.local 5.18.0-2-amd64 #1 SMP PREEMPT_DYNAMIC Debian 5.18.5-1 (2022-06-16) x86_64 GNU/Linux
I've just fixed percolate query to handle exact term modifier wrong at 3e4d145d
You could install daemon package from the dev repository to get issue fixed after CI finishes packaging