openverse-api icon indicating copy to clipboard operation
openverse-api copied to clipboard

Automatically clean up after failed indexing runs (original #402)

Open obulat opened this issue 4 years ago • 0 comments

This issue has been migrated from the CC Search API repository

Author: aldenstpage
Date: Tue Jan 14 2020
Labels: Hacktoberfest,help wanted,✨ goal: improvement,🏷 status: label work required,🙅 status: discontinued

When an indexing job fails (such as if a node in our Elasticsearch cluster has a full disk, or a bug in indexer-worker halts the process), the incomplete index is left inside of the Elasticsearch cluster, requiring someone to manually delete it. The indexer should detect this condition when the job starts and handle it.

The production index is determined by the image alias. The indexer should delete any index NOT pointed to by this alias following the naming scheme image-<uuid>.


Original Comments:

hedonhermdev commented on Sat Feb 22 2020:

Can I work on this issue? source

CodeMonk263 commented on Sun Feb 23 2020:

Can i work on this issue?

source

kgodey commented on Tue Feb 25 2020:

@hedonhermdev go ahead. @CodeMonk263 please find another issue to work on since @hedonhermdev commented first.

DantrazTrev commented on Sat Feb 29 2020:

@hedonhermdev are still working on this issue?

hedonhermdev commented on Sat Feb 29 2020:

No.

On Sat, 29 Feb 2020 at 8:07 PM, Dantraz [email protected] wrote:

@hedonhermdev https://github.com/hedonhermdev are still working on this issue?

source

DantrazTrev commented on Sat Feb 29 2020:

Can i take it over? @aldenstpage

kgodey commented on Tue Mar 03 2020:

Go ahead @DantrazTrev

tushar912 commented on Fri Oct 02 2020:

@DantrazTrev are u still working on this?

kgodey commented on Fri Oct 02 2020:

@tushar912 it's been a few months since @DantrazTrev's post, I think you can go ahead and work on this.

tushar912 commented on Fri Oct 02 2020:

Ok

tushar912 commented on Tue Oct 06 2020:

The way i understood this issue is as follows. The main indexing job is done by indexer.py in ingestion_server . The TableIndexer class contains a method _index_table which checks if the database is in sync with index and replicates if not.There are two methods of indexing. reindex which creates a new index and makes it live alias and update which updates the index. Currently during reindex if the index is not created successfully it still persists in the cluster so the job is to delete the index if indexing fails . @kgodey or @aldenstpage please tell if i have understood correctly. source

tushar912 commented on Tue Oct 06 2020:

Also i am thinking of modifying the already existing consistency_check method and add it to the reindex to delete the index if it is not indexed properly. Am i on the right track? source

obulat avatar Apr 21 '21 12:04 obulat