manticoresearch icon indicating copy to clipboard operation
manticoresearch copied to clipboard

Equals sign in search query breaks request

Open Nick-S-2018 opened this issue 3 years ago • 5 comments

The bug reported here can be reproduced with HTTP request as well. How to reproduce:

  • create an index with the following settings: create table t(f text) index_exact_words='1' expand_keywords='1';

  • make a select query to the index which has a match clause ending with '=' character, e.g. : curl -v POST http://localhost:9308/sql --data-urlencode "query=select * from t where match('test=')"

  • the following error message is sent in response from Manticore: {"error":"syntax error, unexpected $end near ''"}

UPDATE

It was reported in the original issue that escaping does eliminate the error message but the query still returns incorrect empty result.

Nick-S-2018 avatar Jul 27 '22 11:07 Nick-S-2018

I dont think that is technically a 'bug' as such. = is a extended query operator, so has special limitations where it can be used. Its only meant ot be used before a keyword (to imply exact form match only)

If you not expecting = to be treated as a operator, then escape it. ... ie run the query as test\=

(its a bit more complicated in actual queries, as the \ might itself need escaping in the 'single quoted' sql string')

 select * from t where match('test\\=')

Might also need escaping again in the curl command line, not sure!)

barryhunter avatar Jul 27 '22 12:07 barryhunter

@barryhunter thank you for your answer. I've updated the issue taking it into account.

Nick-S-2018 avatar Jul 27 '22 13:07 Nick-S-2018

Re, the 'update'. Escaping the = should work, and then as not in (default) charset_table, will be treated as a seperator. So will match the same as 'test' on it is own.

But will get no results in above example, because didn't insert any data into the index! The index is empty.

mysql> drop table if exists t; create table t(f text) index_exact_words='1'; insert into t values(0,'Product 123 name='); select * from t where match('name\\=');
+-------------------+-------------------+
| id                | f                 |
+-------------------+-------------------+
| 73773252815618049 | Product 123 name= |   <-- the row is returned
+-------------------+-------------------+
mysql> show meta;
+----------------+-------+
| Variable_name  | Value |
+----------------+-------+
| total          | 1     |
| total_found    | 1     |
| total_relation | eq    |
| time           | 0.004 |
| keyword[0]     | name  |  <--- the = is not shown as it a seperator. 
| docs[0]        | 1     |
| hits[0]        | 1     |
+----------------+-------+
mysql>

Tested at https://play.manticoresearch.com/replication/

barryhunter avatar Jul 27 '22 14:07 barryhunter

I think the MRE is:

mysql> drop table if exists t; create table t(f text) index_exact_words='1' expand_keywords='1'; insert into t values(1,'a'); select * from t where match('a\\='); select *
from t where match('a\\-');
--------------
drop table if exists t
--------------

Query OK, 0 rows affected (0.13 sec)

--------------
create table t(f text) index_exact_words='1' expand_keywords='1'
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
insert into t values(1,'a')
--------------

Query OK, 1 row affected (0.01 sec)

--------------
select * from t where match('a\\=')
--------------

Empty set (0.00 sec)

--------------
select * from t where match('a\\-')
--------------

+------+------+
| id   | f    |
+------+------+
|    1 | a    |
+------+------+
1 row in set (0.00 sec)

Problem: match('a\\=') is expected to find a since = is not in the charset_table, but it doesn't.

As an example a\\- works fine (finds a).

Let's compare the metas:

a=:

show meta
--------------

+----------------+-------+
| Variable_name  | Value |
+----------------+-------+
| total          | 0     |
| total_found    | 0     |
| total_relation | eq    |
| time           | 0.000 |
| keyword[0]     | =a=   |
| docs[0]        | 0     |
| hits[0]        | 0     |
| keyword[1]     | a=    |
| docs[1]        | 0     |
| hits[1]        | 0     |
+----------------+-------+
10 rows in set (0.00 sec)

vs

a- :

+----------------+-------+
| Variable_name  | Value |
+----------------+-------+
| total          | 1     |
| total_found    | 1     |
| total_relation | eq    |
| time           | 0.000 |
| keyword[0]     | =a    |
| docs[0]        | 1     |
| hits[0]        | 1     |
| keyword[1]     | a     |
| docs[1]        | 1     |
| hits[1]        | 1     |
+----------------+-------+
10 rows in set (0.00 sec)

As we can see, escaping of = and - differ. Seems that when = is escaped it's then get tokenized wrong (like if it was in the charset_table):

mysql> drop table if exists t; create table t(f text) index_exact_words='1' expand_keywords='1' charset_table='non_cjk, ='; insert into t values(1,'a='); select * from t where match('a\\='); select * from t where match('a\\='); show meta;
--------------
drop table if exists t
--------------

Query OK, 0 rows affected (0.13 sec)

--------------
create table t(f text) index_exact_words='1' expand_keywords='1' charset_table='non_cjk, ='
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
insert into t values(1,'a=')
--------------

Query OK, 1 row affected (0.00 sec)

--------------
select * from t where match('a\\=')
--------------

+------+------+
| id   | f    |
+------+------+
|    1 | a=   |
+------+------+
1 row in set (0.00 sec)

--------------
select * from t where match('a\\=')
--------------

+------+------+
| id   | f    |
+------+------+
|    1 | a=   |
+------+------+
1 row in set (0.00 sec)

--------------
show meta
--------------

+----------------+-------+
| Variable_name  | Value |
+----------------+-------+
| total          | 1     |
| total_found    | 1     |
| total_relation | eq    |
| time           | 0.000 |
| keyword[0]     | =a=   |
| docs[0]        | 1     |
| hits[0]        | 1     |
| keyword[1]     | a=    |
| docs[1]        | 1     |
| hits[1]        | 1     |
+----------------+-------+
10 rows in set (0.00 sec)

sanikolaev avatar Jul 28 '22 05:07 sanikolaev

Ah, it seems expand_keywords is critical to reproduce it. With that can reproduce. Sorry for the confusion.

barryhunter avatar Jul 28 '22 11:07 barryhunter

fixed at the 85057e7b

You need to update daemon from dev repo package to get issue fixed.

tomatolog avatar Mar 29 '23 07:03 tomatolog