robinhood
robinhood copied to clipboard
LIKE queries with left wildcard (%) make them unindexable (probably)
I'm using rbh-report and while the filesystem is growing (but still less than 100M inodes), some commands that used to work within seconds start to be slow (several minutes). I tried to add an index on ENTRIES.fileclass but unfortunately, this doesn't seem to help.
We have slow query when using commands like:
$ rbh-report --user-info -cqS -C stanford+groups+nosnap -f /etc/robinhood.d/oak.conf
Cmd ID State User Host DB Time Query
Query 1580 Sending data robinhoo localhost robinhood_oa 01:32 SELECT uid as attr0,gid as
attr1,type as attr2,COUNT(*) as attr3,SUM(size) as attr4,SUM(blocks) as attr5,MIN(size) as
attr6,MAX(size) as attr7,ROUND(AVG(size)) as attr8 FROM ENTRIES WHERE ENTRIES.fileclass
LIKE BINARY '%+stanford+groups+nosnap+%' GROUP BY attr0,attr1,attr2 ORDER BY attr0
ASC,attr1 ASC,attr2 ASC
For this filesystem, MD rate is low enough that we can use additional indexes to speed up queries:
mysql> desc ENTRIES;
+------------------+--------------------------------------------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+--------------------------------------------------------+------+-----+---------+-------+
| id | varbinary(64) | NO | PRI | NULL | |
| uid | varbinary(127) | YES | MUL | unknown | |
| gid | varbinary(127) | YES | MUL | unknown | |
| size | bigint(20) unsigned | YES | | 0 | |
| blocks | bigint(20) unsigned | YES | | 0 | |
| creation_time | int(10) unsigned | YES | | NULL | |
| last_access | int(10) unsigned | YES | | NULL | |
| last_mod | int(10) unsigned | YES | | NULL | |
| last_mdchange | int(10) unsigned | YES | | NULL | |
| type | enum('symlink','dir','file','chr','blk','fifo','sock') | YES | | file | |
| mode | smallint(5) unsigned | YES | | NULL | |
| nlink | int(10) unsigned | YES | | NULL | |
| md_update | int(10) unsigned | YES | | NULL | |
| invalid | tinyint(1) | YES | | NULL | |
| fileclass | varbinary(1023) | YES | MUL | NULL | |
| class_update | int(10) unsigned | YES | | NULL | |
| alert_status | enum('','clear','alert') | YES | | | |
| modeguard_status | enum('','ok','invalid') | YES | | | |
| alert_lstchk | int(10) unsigned | YES | | 0 | |
| alert_lstalrt | int(10) unsigned | YES | | 0 | |
+------------------+--------------------------------------------------------+------+-----+---------+-------+
According to some online web pages like the ones below, a left wildcard makes LIKE
queries unable to use the column index, which is pretty coherent to what I'm seeing with MySQL 5.7:
https://makandracards.com/makandra/10843-mysql-can-i-speed-up-like-queries-by-adding-an-index http://use-the-index-luke.com/sql/where-clause/searching-for-ranges/like-performance-tuning
Would it be possible to remove the left wildcard to make it indexable and probably faster then?
Or what about getting rid of LIKE
at all?
This is a link that could help: https://stackoverflow.com/questions/2885564/ways-to-implement-tags-pros-and-cons-of-each
Way 3 would be more common for an SQL database, it is simply to use 2 additional tables: one containing the file classes (tags in the example above), and one other doing the mapping (taggings in the example).
This would probably be better than the current LIKE
that needs to scan all entries.