elasticsearch-jdbc icon indicating copy to clipboard operation
elasticsearch-jdbc copied to clipboard

Reindexing records doesn't discard old records which no longer are returned by SQL

Open sanaulla123 opened this issue 8 years ago • 3 comments

We have an issue where when we reindex (we reindex the complete data and not the diff from the last update) the data from SQL, the records which are not returned from the SQL dont get deleted. We would want that the records not returned by the SQL to be removed from the ES index.

Any suggestions on how to go about this?

sanaulla123 avatar Nov 21 '16 05:11 sanaulla123

For a large number of docs, use timestamped indices for reindexing and drop previous indices.

For a small number of docs, the idea is a table of the IDs to be deleted. Then select from that table and execute a select with a column _optype = 'delete'. Maintenance of that extra table is task for the DBA.

jprante avatar Nov 21 '16 08:11 jprante

What are the possible _optype we can use? I can may be use _optype = 'delete' if the status of the record is deleted or _optype = 'someother_valid_optype' if record is not deleted.

sanaulla123 avatar Nov 28 '16 06:11 sanaulla123

You can use _optype=index, _optype=create, _optype=delete, _optype=update

It maps to https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/action/DocWriteRequest.java#L113-L127

jprante avatar Nov 28 '16 10:11 jprante