openverse-api
openverse-api copied to clipboard
Automatically clean up after failed indexing runs (original #402)
This issue has been migrated from the CC Search API repository
Author: aldenstpage
Date: Tue Jan 14 2020
Labels: Hacktoberfest,help wanted,✨ goal: improvement,🏷 status: label work required,🙅 status: discontinued
When an indexing job fails (such as if a node in our Elasticsearch cluster has a full disk, or a bug in indexer-worker halts the process), the incomplete index is left inside of the Elasticsearch cluster, requiring someone to manually delete it. The indexer should detect this condition when the job starts and handle it.
The production index is determined by the image alias. The indexer should delete any index NOT pointed to by this alias following the naming scheme image-<uuid>.
Original Comments:
hedonhermdev commented on Sat Feb 22 2020:
Can I work on this issue? source
CodeMonk263 commented on Sun Feb 23 2020:
Can i work on this issue?
kgodey commented on Tue Feb 25 2020:
@hedonhermdev go ahead. @CodeMonk263 please find another issue to work on since @hedonhermdev commented first.
DantrazTrev commented on Sat Feb 29 2020:
@hedonhermdev are still working on this issue?
hedonhermdev commented on Sat Feb 29 2020:
No.
On Sat, 29 Feb 2020 at 8:07 PM, Dantraz [email protected] wrote:
@hedonhermdev https://github.com/hedonhermdev are still working on this issue?
DantrazTrev commented on Sat Feb 29 2020:
Can i take it over? @aldenstpage
kgodey commented on Tue Mar 03 2020:
Go ahead @DantrazTrev
tushar912 commented on Fri Oct 02 2020:
@DantrazTrev are u still working on this?
kgodey commented on Fri Oct 02 2020:
@tushar912 it's been a few months since @DantrazTrev's post, I think you can go ahead and work on this.
tushar912 commented on Fri Oct 02 2020:
Ok
tushar912 commented on Tue Oct 06 2020:
The way i understood this issue is as follows. The main indexing job is done by
indexer.pyiningestion_server. TheTableIndexerclass contains a method_index_tablewhich checks if the database is in sync with index and replicates if not.There are two methods of indexing.reindexwhich creates a new index and makes it live alias andupdatewhich updates the index. Currently during reindex if the index is not created successfully it still persists in the cluster so the job is to delete the index if indexing fails . @kgodey or @aldenstpage please tell if i have understood correctly. source
tushar912 commented on Tue Oct 06 2020:
Also i am thinking of modifying the already existing
consistency_checkmethod and add it to thereindexto delete the index if it is not indexed properly. Am i on the right track? source