elasticsearch-suggest-plugin
elasticsearch-suggest-plugin copied to clipboard
manual refresh
I was wondering: when are the suggestions loaded into memory, and how much is loaded? I set "refresh_disabled" to "false", and posted '{ "field":"suggest" }' to the api in order to create the suggest index. But curl returns too quickly. It seems that the actual indexing is delayed? The first request to your api takes a lot of time, so I conclude that is the actual build of the index is delayed till then?
Thanks in advance
Hey Nicolas,
The in-memory structure is built, when you execute a query for a field for the first time - therefore the first query can be pretty slow. From then on, the in-memory structure is refreshed by the interval you have chosen.
Does that answer your question? If not, just reask. What exactly do you mean with "curl returns too quickly"?
I have a field "suggest" that contains shingles. So, I execute this command:
curl "http://localhost:9200/meercat/child/__suggestRefresh" -d '{ "field":"suggest" }'
which returns immediately. If the in-memory structure is built during the first query, what does this command actually do then?
You query does not supply a term to suggest for, so it does nothing.
Ah it needs a term. It didn't know that. I concluded from your examples, that you had to supply the field name. Some of your examples do not even supply a field or term (see README.md):
curl -X POST 'localhost:9200/__suggestRefresh'
curl -X POST 'localhost:9200/products/product/__suggestRefresh'
curl -X POST 'localhost:9200/products/product/__suggestRefresh' -d '{ "field" : "ProductName.suggest" }'
What does it update in that second case?
So, this would build the whole structure for the field "suggest"?
curl "http://localhost:9200/meercat/child/__suggestRefresh" -d '{ "field":"suggest",term: "test" }'
uuuh, big sorry on my case... I read you wanted suggestions, but you want to do a refresh. The refresh request is executed synchronously
When you do automatic refresh, you see, how long the complete refresh took. Is this duration not comparable to the runtime of the curl suggest refresh call?
automatic refresh: 10ms (400000 documents) according to the log file manual refresh takes more than 1 minute (yes sometimes it blocks, sometimes it does not nothing)
How does an automatic refresh knows which field to use?
And what exactly does this refreshing do? You said that the structure is built the first time you access __suggest. So that raises my question: what happens during the refreshing?
Thanks
automatic refresh refreshes all data structures for all fields.. and runs in sync
it should do the same. Can you reproduce this with fewer documents as well?
Ok, I tried with less data. Here are the results:
num_docs first_request suggest_refresh 10500 2s 0s 21500 3s 0s 31500 0s 1s
Apparently the addition of the last 10000 documents did not do anything substantial.
Now I'm using 110268 documents, and each requests takes 20 seconds. Even if I supply the same term.
Details: 4 CPU's 8GB of RAM (4GB given to ElasticSearch) 5 shards (no replica's) 1 node
Note: type of field is "shingle" with maximum of 30 words type is "fuzzy" now Refresh is disabled I did not issue a refresh
When can I be sure that no additional processing will be done by this plugin??
Could the shingle size be the reason for the slow requests?
Are you using the latest version? Either 0.90.1-0.7 or 0.90.0-0.6.3?
Can you paste suggest stats of the slow node?
version: 0.90.1-0.7.1
stats: {"_shards":{"total":5,"successful":5,"failed":0},"fstStats":[]}
strange, the autocomplete DOES return results, although these statistics imply otherwise..
can you provide me some sample data (only a few documents) and your mapping and your suggest requests, so I can reproduce? and your configuration.yml if it is different from the default.
Hey there!
Do you still have this issue and can help me reproduce it?
I'm sorry. I had totally forgotten about this.
Here is some data:
http://adore.ugent.be/local/library2.tar.gz
Just put this in /var/lib/elasticsearch/elasticsearch/nodes/0/indices
And this configuration file contains the mapping:
http://adore.ugent.be/local/elasticsearch.mapping.json
Look for search-options-index_mappings, and search-options-index_settings
Not exactly sure wether this problem still lingers.
Try this:
curl -X POST "http://localhost:9200/library2/books/__suggest" -d '{ "field":"author_shingle","term":"Dingeman" }'
It does not reproduce any results, although this query shows it:
http://localhost:9200/library2/books/_search?pretty=true&q=Dingemans
Thanks!