elasticsearch-jdbc
elasticsearch-jdbc copied to clipboard
Reindexing records doesn't discard old records which no longer are returned by SQL
We have an issue where when we reindex (we reindex the complete data and not the diff from the last update) the data from SQL, the records which are not returned from the SQL dont get deleted. We would want that the records not returned by the SQL to be removed from the ES index.
Any suggestions on how to go about this?
For a large number of docs, use timestamped indices for reindexing and drop previous indices.
For a small number of docs, the idea is a table of the IDs to be deleted. Then select from that table and execute a select
with a column _optype = 'delete'
. Maintenance of that extra table is task for the DBA.
What are the possible _optype we can use? I can may be use _optype = 'delete' if the status of the record is deleted or _optype = 'someother_valid_optype' if record is not deleted.
You can use _optype=index
, _optype=create
, _optype=delete
, _optype=update
It maps to https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/action/DocWriteRequest.java#L113-L127