metaspace icon indicating copy to clipboard operation
metaspace copied to clipboard

Lithops: Do partial reprocessing when new databases are added

Open LachlanStuart opened this issue 5 years ago • 0 comments

The Lithops annotation pipeline currently reprocesses all databases. This can cause significant repeated work. Copy across the logic that's in the Spark implementation:

  • If the processing settings (the ds_config excluding the database field) has changed, always reprocess all DBs (IIRC this is also indicated by a reprocess flag with the annotate message)
  • If databases have been removed, delete their corresponding job entries in postgres/elasticsearch but don't reannotate
  • If databases have been added, annotate only the new databases

It would be a good idea to do #859 first, as this change will cause many centroids cache misses and #859 speeds up centroids cache regeneration.

LachlanStuart avatar Mar 15 '21 15:03 LachlanStuart