metaspace
metaspace copied to clipboard
Lithops: Do partial reprocessing when new databases are added
The Lithops annotation pipeline currently reprocesses all databases. This can cause significant repeated work. Copy across the logic that's in the Spark implementation:
- If the processing settings (the
ds_configexcluding the database field) has changed, always reprocess all DBs (IIRC this is also indicated by areprocessflag with the annotate message) - If databases have been removed, delete their corresponding
jobentries in postgres/elasticsearch but don't reannotate - If databases have been added, annotate only the new databases
It would be a good idea to do #859 first, as this change will cause many centroids cache misses and #859 speeds up centroids cache regeneration.