Add support for external versions on Update API
There currently is an explicit lack of support for external version numbers in Update API requests as mentioned in https://github.com/elastic/elasticsearch/issues/25996. I understand the reasoning, however I have a use case that would benefit greatly from adding this functionality.
Currently, I have a worker process that receives a list of documents to update and the external version numbers of the documents to update. It then does a series of expensive MySQL queries to generate the contents of the document and then indexes the document in Elasticsearch with the external version number. Elasticsearch nicely provides the mechanism to reject stale updates using the external version numbers.
I am trying to optimize the case where I know that only a couple of fields were updated in a specific change (say the change from version 24 to 42) and am forced to do the following in order to ensure correctness:
- Get the existing document in Elasticsearch.
- Compare the external version with the "pre-change" external version (24).
- Update the document in memory with the new field values.
- Index the document in Elasticsearch with the "post-change" external version (42).
It would be great if I could do something like this to remove the round trip of the full document contents to and from Elasticsearch:
- Get the new field values from the DB.
- Update the document in Elasticsearch, specifying the version that the document must be in before changes are applied, and specifying the version that the document will be in after changes are applied. If these checks fail, then return an error.
The Update API would look something like this:
POST /<index>/_update/<_id>?oldVersion=24&version=42&version_type=external_gte
{
"changedField": "newValue
}
This would update the document IFF this change was able to increase the version number from 24 to 42 atomically and fail otherwise. Thus, it still preserves the semantics of the Update API while not breaking external version numbers on the updated document. Would it be possible to implement in Elasticsearch?
Pinging @elastic/es-distributed (Team:Distributed)
Just wanted to add my 2 cents to this, I ran into almost the exact same problem as @kylelyk
Needed to perform a partial update with an external versioning system, we require external versioning due to the not incrementing behaviour and we also may index versions out of order. (In which case discarding the older versions as it would fail).
As we couldn't use a partial update, we now have to have custom code around calling DocumentExists, then calling Index for our specific versioning logic.
Would have been able to do it in 1 call if partial update supported external versions. Cheers all
if just incrementing version is ok Maybe this workaround could help ? (it seems not possible to modify _version field inside script)
POST /<index>/_update/1
{
"script": {
"source": """
if (ctx._version == params._old_external_version) {
for(Map.Entry e : params.entrySet()) {
if(!e.getKey().equals("_old_external_version"))
ctx._source[e.getKey()] = e.getValue();
}
} else {
throw new Exception("version_conflict")
}
""",
"params": {
"field1":"value1",
"_old_external_version":24
}
}
}
Also It would be interesting to throw a Version Conflict Exception, but dont know how to do it (the response is 400 instead of 409)
You can also replace exception with ctx.op = 'noop' if you dont want to fail :response will contain "result" : "noop" instead
It's a pity that this feature does not have enough priority :-(
If one uses external versioning for common indexing, it is quite surprise it is not supported for partial document update.
I also ran into the same problems. Hopefully this will be resolved soon.
I cant't agree more.
A further ask here, would it be possible to have different versions for specific fields?
I have an index with documents that get populated from various messages. In order to deal with out of order messages, it would be nice to use versioning, but these are partial updates to the documents so I can't use the Index API. Also, if updating only a single field i'd want the version set for that field and not other fields.
This ask is similar to column level client timestamps in Cassandra.