dataall icon indicating copy to clipboard operation
dataall copied to clipboard

Better handling of "out of sync" Tables

Open noah-paige opened this issue 9 months ago • 1 comments

Is your idea related to a problem? Please describe. data.all has logic for when a user manually goes to delete a table (shares checked, table removed, permissions cleaned, etc.) but something to think about is how best should we handle table syncs (I am not sure the correct answer here). For instance:

  • Either when a user starts a manual sync or the scheduled table_syncer ECS task is run (ultimately both run DatasetTableService.sync_existing_tables() at some point)
  • If the glue table exists in data.all but does not exist in the glue response --> we update table status to Deleted
  • On the UI we no longer show those tables by filtering for != Deleted (ref: DatasetService.paginated_dataset_tables()
  • But these tables still do have associated permission records for who should be able to access the table
    • And if we do clean them up right away but the table DOES still exist on Glue (i.e. some other error in API Response or similar returns 0 tables incorrectly) then we have potentially just broken existing access or shares if the next time a user hits Sync the table re-appears

Describe the solution you'd like Some ideas of what we can do:

  1. Do nothing. There will be stale tables and permission records in RDS but it avoids risk of removing permissions inappropriately and should not greatly affect logic of data.all
  2. Implement some type of garbage collection. Delete tables and associated shares on those tables after the tables have been in status of Deleted for some extended period of time (i.e. 30 days)
  3. As soon as a table status gets updated to Deleted still show it on the UI but with a Deprecated Flag and with the only option for the user to do is to delete the table + clean up shares (pre-req to table delete already)
    • Remove from Catalog (already done by sync), prevent all new shares, only can revoke share and clean-up / delete of table
    • If next sync restores the glue table back to InSync allow for normal activity again (no longer Deprecated)

P.S. Don't attach files. Please, prefer add code snippets directly in the message body.

noah-paige avatar May 02 '24 13:05 noah-paige

Adding to Backlog to be picked up when have some capacity

noah-paige avatar May 21 '24 01:05 noah-paige