robinhood icon indicating copy to clipboard operation
robinhood copied to clipboard

LIKE queries with left wildcard (%) make them unindexable (probably)

Open thiell opened this issue 7 years ago • 1 comments

I'm using rbh-report and while the filesystem is growing (but still less than 100M inodes), some commands that used to work within seconds start to be slow (several minutes). I tried to add an index on ENTRIES.fileclass but unfortunately, this doesn't seem to help.

We have slow query when using commands like:

$ rbh-report --user-info -cqS -C stanford+groups+nosnap -f /etc/robinhood.d/oak.conf
Cmd    ID      State         User      Host       DB            Time   Query                                                                                                                                                                                                                                                                                                                   
Query    1580  Sending data  robinhoo  localhost  robinhood_oa  01:32  SELECT uid as attr0,gid as
attr1,type as attr2,COUNT(*) as attr3,SUM(size) as attr4,SUM(blocks) as attr5,MIN(size) as
attr6,MAX(size) as attr7,ROUND(AVG(size)) as attr8 FROM ENTRIES WHERE ENTRIES.fileclass
LIKE BINARY '%+stanford+groups+nosnap+%' GROUP BY attr0,attr1,attr2 ORDER BY attr0
ASC,attr1 ASC,attr2 ASC

For this filesystem, MD rate is low enough that we can use additional indexes to speed up queries:

mysql> desc ENTRIES;
+------------------+--------------------------------------------------------+------+-----+---------+-------+
| Field            | Type                                                   | Null | Key | Default | Extra |
+------------------+--------------------------------------------------------+------+-----+---------+-------+
| id               | varbinary(64)                                          | NO   | PRI | NULL    |       |
| uid              | varbinary(127)                                         | YES  | MUL | unknown |       |
| gid              | varbinary(127)                                         | YES  | MUL | unknown |       |
| size             | bigint(20) unsigned                                    | YES  |     | 0       |       |
| blocks           | bigint(20) unsigned                                    | YES  |     | 0       |       |
| creation_time    | int(10) unsigned                                       | YES  |     | NULL    |       |
| last_access      | int(10) unsigned                                       | YES  |     | NULL    |       |
| last_mod         | int(10) unsigned                                       | YES  |     | NULL    |       |
| last_mdchange    | int(10) unsigned                                       | YES  |     | NULL    |       |
| type             | enum('symlink','dir','file','chr','blk','fifo','sock') | YES  |     | file    |       |
| mode             | smallint(5) unsigned                                   | YES  |     | NULL    |       |
| nlink            | int(10) unsigned                                       | YES  |     | NULL    |       |
| md_update        | int(10) unsigned                                       | YES  |     | NULL    |       |
| invalid          | tinyint(1)                                             | YES  |     | NULL    |       |
| fileclass        | varbinary(1023)                                        | YES  | MUL | NULL    |       |
| class_update     | int(10) unsigned                                       | YES  |     | NULL    |       |
| alert_status     | enum('','clear','alert')                               | YES  |     |         |       |
| modeguard_status | enum('','ok','invalid')                                | YES  |     |         |       |
| alert_lstchk     | int(10) unsigned                                       | YES  |     | 0       |       |
| alert_lstalrt    | int(10) unsigned                                       | YES  |     | 0       |       |
+------------------+--------------------------------------------------------+------+-----+---------+-------+

According to some online web pages like the ones below, a left wildcard makes LIKE queries unable to use the column index, which is pretty coherent to what I'm seeing with MySQL 5.7:

https://makandracards.com/makandra/10843-mysql-can-i-speed-up-like-queries-by-adding-an-index http://use-the-index-luke.com/sql/where-clause/searching-for-ranges/like-performance-tuning

Would it be possible to remove the left wildcard to make it indexable and probably faster then?

thiell avatar Oct 19 '17 21:10 thiell

Or what about getting rid of LIKE at all?

This is a link that could help: https://stackoverflow.com/questions/2885564/ways-to-implement-tags-pros-and-cons-of-each

Way 3 would be more common for an SQL database, it is simply to use 2 additional tables: one containing the file classes (tags in the example above), and one other doing the mapping (taggings in the example).

This would probably be better than the current LIKE that needs to scan all entries.

thiell avatar Oct 19 '17 21:10 thiell