manticoresearch
manticoresearch copied to clipboard
Slow performance SNIPPETS on agent index
Hi,
To simplify the issue, i have 2 index:
....
index WebPages
{
type = distributed
local = WebPages0
local = WebPages1
local = WebPages2
local = WebPages3
}
index WebPagesLB
{
type = distributed
agent_persistent = sphinxserver:9312:WebPages
ha_strategy = nodeads
}
When i execute SNIPPET on WebPages, the result time is ~40ms:
ELECT Id, SNIPPET(Body, QUERY()) FROM WebPages WHERE MATCH('modele');`
Now, I execute SNIPPET on WebPagesLB and the result time is 1.2s!!!
ELECT Id, SNIPPET(Body, QUERY()) FROM WebPagesLB WHERE MATCH('modele');`
If I remove SNIPPET call, the result time is same.
sphinxserver is localhost.
Why ?
➤ Sergey Nikolaev commented:
I can't reproduce it like this:
snikolaev@dev:~$ cat csv_dist.conf
source src {
type = csvpipe
csvpipe_command = for n in `seq 1 100000`; do echo -n "$n,"; echo $n|md5sum|head -c 10; echo; done
# csvpipe_field = f
csvpipe_field_string = f
}
index idx1 {
type = plain
source = src
path = idx1
dict = keywords
access_plain_attrs = mlock
access_blob_attrs = mlock
access_doclists = mlock
access_hitlists = mlock
min_infix_len = 2
# stored_fields = f
}
index idx2:idx1 {
path = idx2
}
index idx3:idx1 {
path = idx3
}
index idx4:idx1 {
path = idx4
}
index dist {
type = distributed
local = idx1
local = idx2
local = idx3
local = idx4
}
index distp {
type = distributed
agent_persistent = localhost:9316:dist
ha_strategy = nodeads
}
searchd {
listen = 127.0.0.1:9315:mysql41
listen = 127.0.0.1:9316
log = sphinx_min.log
pid_file = /home/snikolaev/9315.pid
binlog_path =
qcache_max_bytes = 0
}
mysql> SELECT Id, SNIPPET(f, QUERY()) FROM distp WHERE MATCH('*ab*') limit 0; show meta;
Empty set (0.01 sec)
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| total | 1000 |
| total_found | 10700 |
| time | 0.010 |
| keyword[0] | *ab* |
| docs[0] | 13700 |
| hits[0] | 13700 |
+---------------+-------+
6 rows in set (0.00 sec)
mysql> SELECT Id, SNIPPET(f, QUERY()) FROM dist WHERE MATCH('*ab*') limit 0; show meta;
Empty set (0.01 sec)
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| total | 1000 |
| total_found | 10700 |
| time | 0.004 |
| keyword[0] | *ab* |
| docs[0] | 13700 |
| hits[0] | 13700 |
+---------------+-------+
6 rows in set (0.00 sec)
Please provide a reproducible case. Feel free to upload your indexes and config to our ftp - https://mnt.cr/ftp
I upload to FTP 2 files:
- maticore.conf
- data.zip : the .spX
You can reproduce the issue on Debian 10 and Manticore 3.6.0 96d61d8bf@210504 release
For test you can modify /etc/hosts to redirect SERVERX to localhost: agent_persistent = SERVER1:9312|SERVER2:9312|SERVER3:9312|SERVER4:9312:WebPages
Thank you! I could reproduce the issue on our side. I could also reproduce:
mysql> SELECT Id, SNIPPET(Body, QUERY()) FROM WebPagesLB WHERE MATCH('modele');
ERROR 1064 (42000): index WebPagesLB: agent localhost:9312: agent has 32-bit docids; no longer supported
➤ Aleksey N. Vinogradov commented:
That is because of implicit limit for remotes. For local agents by default limit is 20. For remotes it is 1000. So, when you query the balancer - it sends request to a mirror with internal max_matches=1000. Then it retrieve ALL matches and return you 20 (or whatever limit is set). By default we're trained to deal with aggregations - so if you want something like avg() over several different agents, or even count/count(distinct) - we need many matches to be precise. But the same codepath is in game even for single mirror, where such behavior looks too cruel.