manticoresearch icon indicating copy to clipboard operation
manticoresearch copied to clipboard

Percolate index does not search properly by exact phrase query when stemming enabled

Open usatenko opened this issue 3 years ago • 4 comments

Describe the bug
I use the latest 4.2.0 version.
If I enable stemming mode morphology='stem_en' in combination with e.g. index_exact_words='1' for the percolate index, it does not search properly by exact phrase query.
single parameter morphology='stem_en' searches well, also as index_exact_words='1' without stemming.

To Reproduce
Steps to reproduce the behavior:

Example (not correct behaviour):
1.1

CREATE TABLE jjj (  
title text  
) type='pq' index_exact_words='1' morphology='stem_en'  

2.1

insert into jjj (id, query) values (1, 'finance business partner');  
insert into jjj (id, query) values (2, '="finance business partner"');  
insert into jjj (id, query) values (3, '"finance business partner"');  

3.1

CALL PQ('jjj', '[{"lid": 123, "title": "finance business partner"}]', 'lid' AS docs_id, 1 AS docs);`  
 ------ -----------   
| id   | documents |  
 ------ -----------   
|    1 | 123       |  
|    3 | 123       |  
 ------ -----------   
2 rows in set (0.00 sec)  

But it should give me three IDs: 1, 2 and 3.

Other example (correct behaviour):
1.2

CREATE TABLE eee (  
title text  
) type='pq' morphology='stem_en'  
  

2.2

insert into eee (id, query) values (1, 'finance business partner');  
insert into eee (id, query) values (2, '="finance business partner"');  
insert into eee (id, query) values (3, '"finance business partner"');  
  

3.2

CALL PQ('eee', '[{"lid": 123, "title": "finance business partner"}]', 'lid' AS docs_id, 1 AS docs);  
 ------ -----------   
| id   | documents |  
 ------ -----------   
|    1 | 123       |  
|    2 | 123       |  
|    3 | 123       |  
 ------ -----------   
3 rows in set (0.00 sec)  
  

Other example (correct behaviour):
1.3

CREATE TABLE lll (  
title text  
) type='pq' index_exact_words='1';  

2.3

insert into lll (id, query) values (1, 'finance business partner');  
insert into lll (id, query) values (2, '="finance business partner"');  
insert into lll (id, query) values (3, '"finance business partner"');  
  

3.3

CALL PQ('lll', '[{"lid": 123, "title": "finance business partner"}]', 'lid' AS docs_id, 1 AS docs);  
 ------ -----------   
| id   | documents |  
 ------ -----------   
|    1 | 123       |  
|    2 | 123       |  
|    3 | 123       |  
 ------ -----------   
3 rows in set (0.00 sec)  

Expected behavior
I expect all three percolate index setups to give the same results.

Describe the environment:
Manticore 4.2.0 15e927b28@211223 release (columnar 1.11.4 327b3d4@211223)
Linux manticore-0.local 5.15.0-2-amd64 #1 SMP Debian 5.15.5-2 (2021-12-18) x86_64 GNU/Linux

Messages from log files:
No specific messages related to this

Additional context
N/A

usatenko avatar Jan 14 '22 08:01 usatenko

Thank you for the good minimal reproducible example.

sanikolaev avatar Jan 17 '22 04:01 sanikolaev

OK, we avoided this bug in our functionality but anyway would be good to have a fix for it

usatenko avatar May 24 '22 18:05 usatenko

Sure, we'll fix it eventually, it's just that there are more important issues since the functionality which causes this one is not commonly used: percolate + stemming + exact form modifier + phrase operator. So it's going to backlog until we have time to work on it. Pull-requests are very welcome!

sanikolaev avatar May 26 '22 06:05 sanikolaev

Still exists on Server version: 5.0.2 348514c86@220530 dev git branch HEAD (no branch) Linux manticore-0.local 5.18.0-2-amd64 #1 SMP PREEMPT_DYNAMIC Debian 5.18.5-1 (2022-06-16) x86_64 GNU/Linux

usatenko avatar Sep 05 '22 18:09 usatenko

I've just fixed percolate query to handle exact term modifier wrong at 3e4d145d

You could install daemon package from the dev repository to get issue fixed after CI finishes packaging

tomatolog avatar Mar 22 '23 13:03 tomatolog