Elastica
Elastica copied to clipboard
Does RetryOnConflict suppose to work with Bulk updates
I have an application which updates documents in bulk through multiple workers. I'd like to use the retry_on_conflict
feature, but even if I have the setting at the ClientConfiguration, Bulk::addDocuments()
does not seem to add the necessary piece to the Document metadata in order to get the proper update action as it is described at https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-bulk.html#bulk-update
They say, every update should contain the retry_on_conflict
attribute like this:
POST _bulk
{ "update" : {"_id" : "1", "_index" : "index1", "retry_on_conflict" : 3} }
{ "doc" : {"field" : "value"} }
{ "update" : { "_id" : "0", "_index" : "index1", "retry_on_conflict" : 3} }
{ "script" : { "source": "ctx._source.counter += params.param1", "lang" : "painless", "params" : {"param1" : 1}}, "upsert" : {"counter" : 1}}
{ "update" : {"_id" : "2", "_index" : "index1", "retry_on_conflict" : 3} }
{ "doc" : {"field" : "value"}, "doc_as_upsert" : true }
{ "update" : {"_id" : "3", "_index" : "index1", "_source" : true} }
{ "doc" : {"field" : "value"} }
{ "update" : {"_id" : "4", "_index" : "index1"} }
{ "doc" : {"field" : "value"}, "_source": true}
I only get the _id
and _index
values added to the document metadata. What do I do wrong?
Can you share the exact code snippet you are using and the version of Elastica? There was in one version a change from _retry_on_conflict
to retry_on_conflict
. A quick search, i found there is a for a similar scenario you describe here: https://github.com/ruflin/Elastica/blob/0f6b04b5ebe6df51015fb54b4c455bd7af5b093f/tests/BulkTest.php#L694
Actually I'm using FOSElasticaBundle and they have added support for RetryOnConflict parameter per connection (https://github.com/FriendsOfSymfony/FOSElasticaBundle/pull/965) long time ago.
My config looks like:
fos_elastica:
clients:
default: { url: '%env(ELASTICSEARCH_URL)%', retryOnConflict: 5 }
indexes:
which sets
However I can't see if the Elastica\Bulk
class would take the parameter into consideration at all and would apply the same like in the test
https://github.com/ruflin/Elastica/blob/0f6b04b5ebe6df51015fb54b4c455bd7af5b093f/tests/BulkTest.php#L686C9
Having a quick look at the code, this might be a bug / missing feature in Elastica. I couldn't find a place where the retry_on_conflict from the agent it taken into account in bulk :-( When adding a document, to bulk, it should be checked if it is an update and if retry on conflict is set on the client side. If yes, it should add it: https://github.com/ruflin/Elastica/tree/8.x/src#L134 Could you by chance make this modification to your version of Elastica and see if it solves the problem?
Sure, I will give a try. In the mean time, could you check the link (https://github.com/ruflin/Elastica/tree/8.x/src#L134). You might want to point to a file in the repo, but it got somehow removed.
This is the file and line I wanted to link to: https://github.com/ruflin/Elastica/blob/8.x/src/Bulk.php#L134 Not sure where the file name was lost :-(
Please see my PR. It adds the RetryOnConflict value (if exists) from the Client's connection configuration. With this change, I was able to generate the similar output (with this attribute) which was pasted in the opening message.