dataall
dataall copied to clipboard
Better handling of "out of sync" Tables
Is your idea related to a problem? Please describe. data.all has logic for when a user manually goes to delete a table (shares checked, table removed, permissions cleaned, etc.) but something to think about is how best should we handle table syncs (I am not sure the correct answer here). For instance:
- Either when a user starts a manual sync or the scheduled table_syncer ECS task is run (ultimately both run
DatasetTableService.sync_existing_tables()
at some point) - If the glue table exists in data.all but does not exist in the glue response --> we update table status to
Deleted
- On the UI we no longer show those tables by filtering for
!= Deleted
(ref:DatasetService.paginated_dataset_tables()
- But these tables still do have associated permission records for who should be able to access the table
- And if we do clean them up right away but the table DOES still exist on Glue (i.e. some other error in API Response or similar returns 0 tables incorrectly) then we have potentially just broken existing access or shares if the next time a user hits
Sync
the table re-appears
- And if we do clean them up right away but the table DOES still exist on Glue (i.e. some other error in API Response or similar returns 0 tables incorrectly) then we have potentially just broken existing access or shares if the next time a user hits
Describe the solution you'd like Some ideas of what we can do:
- Do nothing. There will be stale tables and permission records in RDS but it avoids risk of removing permissions inappropriately and should not greatly affect logic of data.all
- Implement some type of garbage collection. Delete tables and associated shares on those tables after the tables have been in status of
Deleted
for some extended period of time (i.e. 30 days) - As soon as a table status gets updated to
Deleted
still show it on the UI but with aDeprecated
Flag and with the only option for the user to do is to delete the table + clean up shares (pre-req to table delete already)- Remove from Catalog (already done by sync), prevent all new shares, only can revoke share and clean-up / delete of table
- If next sync restores the glue table back to
InSync
allow for normal activity again (no longer Deprecated)
P.S. Don't attach files. Please, prefer add code snippets directly in the message body.
Adding to Backlog to be picked up when have some capacity