manticoresearch icon indicating copy to clipboard operation
manticoresearch copied to clipboard

Highlighting inconsistent with more than 2 NEAR terms

Open regstuff opened this issue 1 year ago • 1 comments

Describe the bug When searching for 3 terms using the NEAR operator, word1 is not highlighted if there are two instances of word2 that satisfy the proximity rule, but only one instance of word3 For example: word1 randomword word2 randomword word3 randomword word2 Highlighting of word1 breaks when both instances of word2 satisfy the proximity condition.

[MRE] Below python code creates two docs. In the first doc, 'word' is present twice. In the second, the last instance of 'word' is replaced by something else. Two searches are conducted. The second search will include the second instance of 'word' in proximity in doc1, and causes highlighting to break for 'sentence'.

ixname = 'products'
row = {'title': '<p>sentence and word and letter and word</p>', 'contentid': '1'}
resp = indexApi.insert({"index" : ixname, "doc" : row})

row = {'title': '<p>sentence and word and letter and punctuation</p>', 'contentid': '2'}
resp = indexApi.insert({"index" : ixname, "doc" : row})

resp = utilsApi.sql('SELECT *, HIGHLIGHT({before_match=\'<span class="match">\', after_match=\'</span>\', limit=0, html_strip_mode=\'retain\'}, \'\') FROM products WHERE MATCH(\'sentence NEAR/2 word NEAR/2 letter\')')
print(resp[0]['data'])
print('+{10}')
resp = utilsApi.sql('SELECT *, HIGHLIGHT({before_match=\'<span class="match">\', after_match=\'</span>\', limit=0, html_strip_mode=\'retain\'}, \'\') FROM products WHERE MATCH(\'sentence NEAR/7 word NEAR/2 letter\')')
print(resp[0]['data'])

regstuff avatar Sep 09 '22 15:09 regstuff

MRE in SQL form

mysql> drop table if exists t; create table t (f text); insert into t(f) values('sentence and word and letter and word'); select highlight() from t where match('sentence NEAR/7 word NEAR/2 letter');
--------------
drop table if exists t
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
create table t (f text)
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
insert into t(f) values('sentence and word and letter and word')
--------------

Query OK, 1 row affected (0.00 sec)

--------------
select highlight() from t where match('sentence NEAR/7 word NEAR/2 letter')
--------------

+-----------------------------------------------------+
| highlight()                                         |
+-----------------------------------------------------+
| sentence and <b>word</b> and <b>letter</b> and word |
+-----------------------------------------------------+
1 row in set (0.00 sec)

sentence is expected to be highlighted, but it's not. If NEAR/7 is replaced with NEAR/2 it does get highlighted:

select highlight() from t where match('sentence NEAR/2 word NEAR/2 letter')
--------------

+------------------------------------------------------------+
| highlight()                                                |
+------------------------------------------------------------+
| <b>sentence</b> and <b>word</b> and <b>letter</b> and word |
+------------------------------------------------------------+
1 row in set (0.00 sec)

sanikolaev avatar Sep 12 '22 02:09 sanikolaev