elasticsearch-suggest-plugin icon indicating copy to clipboard operation
elasticsearch-suggest-plugin copied to clipboard

manual refresh

Open nicolasfranck opened this issue 11 years ago • 15 comments

I was wondering: when are the suggestions loaded into memory, and how much is loaded? I set "refresh_disabled" to "false", and posted '{ "field":"suggest" }' to the api in order to create the suggest index. But curl returns too quickly. It seems that the actual indexing is delayed? The first request to your api takes a lot of time, so I conclude that is the actual build of the index is delayed till then?

Thanks in advance

nicolasfranck avatar May 31 '13 14:05 nicolasfranck

Hey Nicolas,

The in-memory structure is built, when you execute a query for a field for the first time - therefore the first query can be pretty slow. From then on, the in-memory structure is refreshed by the interval you have chosen.

Does that answer your question? If not, just reask. What exactly do you mean with "curl returns too quickly"?

spinscale avatar May 31 '13 18:05 spinscale

I have a field "suggest" that contains shingles. So, I execute this command:

curl "http://localhost:9200/meercat/child/__suggestRefresh" -d '{ "field":"suggest" }'

which returns immediately. If the in-memory structure is built during the first query, what does this command actually do then?

nicolasfranck avatar Jun 01 '13 10:06 nicolasfranck

You query does not supply a term to suggest for, so it does nothing.

spinscale avatar Jun 01 '13 11:06 spinscale

Ah it needs a term. It didn't know that. I concluded from your examples, that you had to supply the field name. Some of your examples do not even supply a field or term (see README.md):

curl -X POST 'localhost:9200/__suggestRefresh'
curl -X POST 'localhost:9200/products/product/__suggestRefresh'
curl -X POST 'localhost:9200/products/product/__suggestRefresh' -d '{ "field" : "ProductName.suggest" }'

What does it update in that second case?

So, this would build the whole structure for the field "suggest"?

curl "http://localhost:9200/meercat/child/__suggestRefresh" -d '{ "field":"suggest",term: "test" }'

nicolasfranck avatar Jun 01 '13 16:06 nicolasfranck

uuuh, big sorry on my case... I read you wanted suggestions, but you want to do a refresh. The refresh request is executed synchronously

When you do automatic refresh, you see, how long the complete refresh took. Is this duration not comparable to the runtime of the curl suggest refresh call?

spinscale avatar Jun 01 '13 17:06 spinscale

automatic refresh: 10ms (400000 documents) according to the log file manual refresh takes more than 1 minute (yes sometimes it blocks, sometimes it does not nothing)

How does an automatic refresh knows which field to use?

And what exactly does this refreshing do? You said that the structure is built the first time you access __suggest. So that raises my question: what happens during the refreshing?

Thanks

nicolasfranck avatar Jun 02 '13 16:06 nicolasfranck

automatic refresh refreshes all data structures for all fields.. and runs in sync

it should do the same. Can you reproduce this with fewer documents as well?

spinscale avatar Jun 02 '13 16:06 spinscale

Ok, I tried with less data. Here are the results:

num_docs first_request suggest_refresh 10500 2s 0s 21500 3s 0s 31500 0s 1s

Apparently the addition of the last 10000 documents did not do anything substantial.

nicolasfranck avatar Jun 03 '13 09:06 nicolasfranck

Now I'm using 110268 documents, and each requests takes 20 seconds. Even if I supply the same term.

Details: 4 CPU's 8GB of RAM (4GB given to ElasticSearch) 5 shards (no replica's) 1 node

Note: type of field is "shingle" with maximum of 30 words type is "fuzzy" now Refresh is disabled I did not issue a refresh

When can I be sure that no additional processing will be done by this plugin??

nicolasfranck avatar Jun 04 '13 10:06 nicolasfranck

Could the shingle size be the reason for the slow requests?

nicolasfranck avatar Jun 04 '13 10:06 nicolasfranck

Are you using the latest version? Either 0.90.1-0.7 or 0.90.0-0.6.3?

Can you paste suggest stats of the slow node?

spinscale avatar Jun 04 '13 14:06 spinscale

version: 0.90.1-0.7.1

stats: {"_shards":{"total":5,"successful":5,"failed":0},"fstStats":[]}

strange, the autocomplete DOES return results, although these statistics imply otherwise..

nicolasfranck avatar Jun 04 '13 14:06 nicolasfranck

can you provide me some sample data (only a few documents) and your mapping and your suggest requests, so I can reproduce? and your configuration.yml if it is different from the default.

spinscale avatar Jun 05 '13 07:06 spinscale

Hey there!

Do you still have this issue and can help me reproduce it?

spinscale avatar Aug 13 '13 19:08 spinscale

I'm sorry. I had totally forgotten about this.

Here is some data:

http://adore.ugent.be/local/library2.tar.gz

Just put this in /var/lib/elasticsearch/elasticsearch/nodes/0/indices

And this configuration file contains the mapping:

http://adore.ugent.be/local/elasticsearch.mapping.json

Look for search-options-index_mappings, and search-options-index_settings

Not exactly sure wether this problem still lingers.

Try this:

curl -X POST "http://localhost:9200/library2/books/__suggest" -d '{ "field":"author_shingle","term":"Dingeman" }'

It does not reproduce any results, although this query shows it:

http://localhost:9200/library2/books/_search?pretty=true&q=Dingemans

Thanks!

nicolasfranck avatar Aug 14 '13 06:08 nicolasfranck