Elastica icon indicating copy to clipboard operation
Elastica copied to clipboard

Documents sometimes not correctly updated resulting in data loss.

Open tom-pryor opened this issue 10 years ago • 5 comments

During medium volumes of individual documents being indexed (5 or so a second) we're noticing some data loss. A new document is indexed then shortly after (a few seconds, greater than refresh_interval, i.e the document is indexed) we attempt to update a field in the document again using Elastica.

However, a "Undefined index _version" at:

https://github.com/ruflin/Elastica/blob/master/lib/Elastica/Type.php#L248

Occasionally occurs when updating the document. When this occurs the whole document is replaced solely with the updated field(s) and nothing else, causing data loss.

The error logging tool we are using records the context and the value of $result is very strange. It's an array of 4 elements:

Key Value
_shards Array of length 3
hits Array of length 3
timed_out false
took 414

Which seems to indicate a query was performed rather than fetching the document by id.

Running Elasticsearch 1.0.1 and Elastica 1.0.0.

I'll try and see if I can get some more information.

tom-pryor avatar Mar 03 '14 02:03 tom-pryor

Been playing around and the issue seems related to the persistent curl connection. I've logged the responses and using a persistent curl connection seems to occasionally either return stale (i.e the result of a previous request) or blank responses.

tom-pryor avatar Mar 03 '14 15:03 tom-pryor

Did you try to turn off persistent? https://github.com/ruflin/Elastica/blob/master/lib/Elastica/Client.php#L38

It will make it slower, but perhaps it solves the problem.

An other good option is to us Bulk queries if you have a lot of requests.

ruflin avatar Mar 05 '14 19:03 ruflin

Yeah, turning off persistent fixes the problem. Although I'm not sure why the issue is occurring with persistent enabled, seems like strange behaviour.

I'd use bulk queries but the problem is it is indexing data received over an API (i.e have no control when data comes in) and it needs to be available to search pretty much instantly.

tom-pryor avatar Mar 07 '14 16:03 tom-pryor

What php and curl version do you use?

ruflin avatar Mar 08 '14 07:03 ruflin

@Tomdarkness Can you check if this change resolves your problem? https://github.com/ruflin/Elastica/pull/567/files

ruflin avatar Mar 27 '14 22:03 ruflin