manticoresearch icon indicating copy to clipboard operation
manticoresearch copied to clipboard

Crash with using Attribute and Stored Field in SELECT expression

Open barryhunter opened this issue 2 years ago • 13 comments

Describe the bug Manticore crashes when running certain queries. Seems to be when using CONCAT in the SELECT part

THis is a query consistent crashing

 SELECT CONCAT('') AS url, CONCAT(label, ' [',TO_STRING(images),']') AS title, CONCAT('Automatic Cluster') AS `type` FROM gridimage_group_stat WHERE MATCH('SX5663') AND ANY(image_ids) IN (2744944) ORDER BY images DESC;

It works OK if just use SELECT * FROM. 'label' is a string in docstore, and 'images' is a numeric attribute.

I know doing this sort of string expressions, would be best done in the application (ie just return the two values directly, then form result set in app). But the query is embedded in other code, that uses a standard function for forming queries.

Details here: https://gist.github.com/barryhunter/2e243cb9ce91840db7df2cf8a192376a have just dumped a small sample of rows from the whole index, but crash happens even with the small number of rows

To Reproduce Steps to reproduce the behavior:

  1. Import data (from gist above)
  2. Run query
  3. Manticore Crashes

Describe the environment:

  • Manticore Search version: 5.0.0 b4cb7da02@220518 release
  • OS version: 5.0.0.2 Helm Chart

Messages from log files: (searchd.log crash dump in gist)

https://gist.github.com/barryhunter/2e243cb9ce91840db7df2cf8a192376a (searchd.log crash dump in the gist)

Additional context

I can reproduce this on https://play.manticoresearch.com/replication/

Just pasted the CREATE TABLE and hte REPLACE INTO from the gist, then run the SELECT

Same thing ERROR 2013 (HY000): Lost connection to MySQL server during query

barryhunter avatar Aug 26 '22 15:08 barryhunter

I've narrowed it down I think to one using the label in CONCAT (a field in the docstore) AND the TO_STRING() that causes the issue

RT.staging>SELECT CONCAT('test',TO_STRING(images)) AS test FROM gridimage_group_stat WHERE MATCH('SX5663') AND ANY(image_ids) IN (2744944) ORDER BY images DESC;
+-------+
| test  |
+-------+
| test7 |
| test4 |
| test3 |
| test2 |
+-------+
4 rows in set (0.003 sec)

RT.staging>SELECT CONCAT(label,'test',TO_STRING(images)) AS test FROM gridimage_group_stat WHERE MATCH('SX5663') AND ANY(image_ids) IN (2744944) ORDER BY images DESC;
ERROR 2013 (HY000): Lost connection to MySQL server during query

RT.staging>SELECT label,images FROM gridimage_group_stat WHERE MATCH('SX5663') AND ANY(image_ids) IN (2744944) ORDER BY images DESC;
+---------------------------+--------+
| label                     | images |
+---------------------------+--------+
| Lee Moor                  |      7 |
| Shaugh Prior Bridleway 44 |      4 |
| Quarry Road               |      3 |
| Road Near                 |      2 |
+---------------------------+--------+

RT.staging>SELECT CONCAT(label,'test'),images FROM gridimage_group_stat WHERE MATCH('SX5663') AND ANY(image_ids) IN (2744944) ORDER BY images DESC;
+-------------------------------+--------+
| concat(label,'test')          | images |
+-------------------------------+--------+
| Lee Moortest                  |      7 |
| Shaugh Prior Bridleway 44test |      4 |
| Quarry Roadtest               |      3 |
| Road Neartest                 |      2 |
+-------------------------------+--------+
4 rows in set (0.001 sec)

barryhunter avatar Aug 26 '22 16:08 barryhunter

you can not use docstore in any expressions as it is the hard coded different part of the engine

docstore could be only part of the select list and could be source for highlight any other usage are invalid

we already have ticket in backlog to produce parse error in case docstore column used in any scenario beside I said

ie fetch from docstore done at the very last stage for limit documents at master node to embed into result set for send to client

tomatolog avatar Aug 26 '22 16:08 tomatolog

I will link this ticket to ticket at GitLab with the fix planned and will close together along the fix that will produce query error will be pushed

tomatolog avatar Aug 26 '22 16:08 tomatolog

Hmm, SELECT CONCAT(label,'test') AS... works. (ie can use the an expression on docstore. )

Are you saying can't use expression based on attributes and docstore at the same time?

barryhunter avatar Aug 26 '22 16:08 barryhunter

in case you have field + string attribute that also stores field data in docstore you could use expression on it without any issue, however from your gist I was sure that you have only field with docstore but not the attribute

tomatolog avatar Aug 26 '22 16:08 tomatolog

at your gist label is a text data type and our manual said text is a full-text field that also got stored , ie there is no label attribute that could participate in the expression

tomatolog avatar Aug 26 '22 16:08 tomatolog

Label is only in docstore, not attribute. Can use label in expressions as such...

RT.staging>describe gridimage_group_stat;
+----------------+--------+----------------+
| Field          | Type   | Properties     |
+----------------+--------+----------------+
| id             | bigint |                |
| grid_reference | text   | indexed        |
| label          | text   | indexed stored |
| images         | uint   |                |
| users          | uint   |                |
| grid_reference | string |                |
| image_ids      | mva    |                |
+----------------+--------+----------------+
7 rows in set (0.001 sec)

RT.staging>SELECT CONCAT(label,'test'),images FROM gridimage_group_stat WHERE MATCH('SX5663') AND ANY(image_ids) IN (2744944) ORDER BY images DESC;
+-------------------------------+--------+
| concat(label,'test')          | images |
+-------------------------------+--------+
| Lee Moortest                  |      7 |
| Shaugh Prior Bridleway 44test |      4 |
| Quarry Roadtest               |      3 |
| Road Neartest                 |      2 |
+-------------------------------+--------+
4 rows in set (0.001 sec)

RT.staging>select SUBSTRING_INDEX(label,' ',1),regex(label,'Road'),label  FROM gridimage_group_stat;
+------------------------------+---------------------+------------------------------------------------+
| substring_index(label,' ',1) | regex(label,'Road') | label                                          |
+------------------------------+---------------------+------------------------------------------------+
| Blackaton                    |                   0 | Blackaton Cross                                |
| Boundary                     |                   0 | Boundary Stone                                 |
| Car                          |                   0 | Car Park                                       |
| Claypit                      |                   0 | Claypit                                        |
| Emmets                       |                   0 | Emmets Post and Tumulus                        |
| Evidence                     |                   0 | Evidence Given Regarding a Nearby Stone Called |
| Lee                          |                   0 | Lee Moor                                       |
| Old                          |                   0 | Old Boundary Marker                            |
| Old                          |                   1 | Old Cornwood Road                              |
| Quarry                       |                   1 | Quarry Road                                    |
| Road                         |                   1 | Road Near                                      |
| Shaugh                       |                   0 | Shaugh Lake China Clay Works                   |
| Shaugh                       |                   0 | Shaugh Prior Bridleway 44                      |
| Track                        |                   0 | Track                                          |
| Trowlesworthy                |                   0 | Trowlesworthy Warren House                     |
+------------------------------+---------------------+------------------------------------------------+
15 rows in set (0.001 sec)

RT.staging>select SUBSTRING_INDEX(label,' ',images),regex(label,'Road'),label  FROM gridimage_group_stat;
ERROR 2013 (HY000): Lost connection to MySQL server during query

That last one is showing an expression that uses docstore and attribute fails. Otherwise expressions using ONLY docstore work.

barryhunter avatar Aug 26 '22 16:08 barryhunter

btw, it fine if mixed expressions are not supported, but yes, it would definitely be preferable to get an error message, rather than a crash.

Not seen anywhere documented that can't use docstore values in expressions, just that docstore is 'post limit' looked up at last stage. But that seems to still allow expressions.

barryhunter avatar Aug 26 '22 16:08 barryhunter

fetch of single document from the docstore is a slow operation and at your example you fetching every document from the docstore during full-scan drop all data but keep label field add something via expression make result dynamic string then drop that document when sorter got filled up and try to sort docs out

this case will be very slow on production data with lot of documents and cause a lot of pressure on OS memory manager

I was curious why it work at all with docstore expressions

tomatolog avatar Aug 26 '22 21:08 tomatolog

Are you suggesting that a query like

 select concat(label,' + test'),images from gridimage_group_stat order by images desc;

would be evaluating the expression and getting the label from docstore for every document (the index has 2.3M docs)., even though only 20 rows are returned? I assumed the expression would only be executed for the 20 documents returned (ie post-limit).

It doesnt seem notable slower https://gist.github.com/barryhunter/24b26db18fb7c9a1cea5fd4a45234541 SHOW PROFILE, shows similar time in eval_post, and eval_getfield which suggests the expression is happening post (that's on the full index of 2.3M docs)

Indeed, if run

RT.production>select label,regex(label,'Road') AS rege from gridimage_group_stat where rege =1;
ERROR 2013 (HY000): Lost connection to MySQL server during query

which forces the expression to run 'pre' limit (as used in WHERE) then indeed searchd crashes again.

barryhunter avatar Aug 26 '22 23:08 barryhunter

MRE

mysql> drop table if exists t; CREATE TABLE t ( a int, b text ); REPLACE INTO t (id,a,b) VALUES (1,1,'abc'); SELECT CONCAT(b, TO_STRING(a)) FROM t;
--------------
drop table if exists t
--------------

Query OK, 0 rows affected (0.03 sec)

--------------
CREATE TABLE t ( a int, b text )
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
REPLACE INTO t (id,a,b) VALUES (1,1,'abc')
--------------

Query OK, 1 row affected (0.00 sec)

--------------
SELECT CONCAT(b, TO_STRING(a)) FROM t
--------------

ERROR 2013 (HY000): Lost connection to MySQL server during query
No connection. Trying to reconnect...
ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (61)
ERROR:
Can't connect to the server

Workaround

- b text
+ b text indexed attribute
mysql> drop table if exists t; CREATE TABLE t ( a int, b text indexed attribute); REPLACE INTO t (id,a,b) VALUES (1,1,'abc'); SELECT CONCAT(b, TO_STRING(a)) FROM t;
...
+-------------------------+
| concat(b, TO_STRING(a)) |
+-------------------------+
| abc1                    |
+-------------------------+
1 row in set (0.00 sec)

sanikolaev avatar Aug 29 '22 03:08 sanikolaev

I've taken the liberty of updating the subject, it wasn't the CONCAT itself that the issue, its using both attribute and field in same expression.

As for the workaround, the other alternative is to make a 'text stored'. If both are stored fields that also works fine! (of course then don't need the TO_STRING() in the CONCAT :)

barryhunter avatar Aug 30 '22 09:08 barryhunter

➤ Ilya Kuznetsov commented:

Disabled using stored fields in expressions in 819b5596

githubmanticore avatar Sep 05 '22 16:09 githubmanticore