iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Spark: Remove backup table after a successful migrate action.

Open sririshindra opened this issue 2 years ago • 9 comments

Once the MigrateTable action is called, a backup table is created. The backup table is used to restore the original table in case the MigrateTable action fails. This restoration is done by renaming the backup table to the original table. Therefore if the migrate operation fails there is no backup table lying around in the catalog. However, if the migration goes through successfully, the backup table is not cleaned up leaving the user to find a new unknown table in their catalog.

This PR cleans up the backup table after the migrate table action succeeds.

I tested this manually and I also added additional assertions to the existing unit tests to verify that the backup table is cleaned up properly.

sririshindra avatar Aug 23 '22 16:08 sririshindra

@aokolnychyi , @rdblue , @danielcweeks Could you please take a look at this PR if you get a chance? Thanks.

sririshindra avatar Aug 26 '22 00:08 sririshindra

I did notice it the other day, and also thought it would be a good usability feature (especially as there is snapshot table that keeps the original table). I wonder if need to have a drop_backup flag or something, in case someone depends on original behavior somehow? Let's see if everyone is ok with it.

szehon-ho avatar Aug 26 '22 03:08 szehon-ho

@RussellSpitzer were you the original author of the migrate action (#1525)? Was the purpose of the backup table simply for restoring it in case the migrate failed? If so, it would make sense to remove the backup if the migrate succeeded, right?

wypoon avatar Aug 26 '22 19:08 wypoon

I'm -1 on this. Not removing the backup table is intentional so that the user could choose which table to move forward with. Our docs included a step to "remove the backup table" once everything was running smoothly.

I'm fine adding this as an option, but I don't think it should be the default.

rdblue avatar Aug 28 '22 22:08 rdblue

@rdblue can you please point me to the doc with a step to remove the backup table? I do not see it in https://iceberg.apache.org/docs/latest/spark-procedures/#migrate.

wypoon avatar Aug 29 '22 03:08 wypoon

@rdblue In that case how would you feel about letting the user decide on the name of the backup table instead of the generic backup suffix appended to the table name? I think we should at least document this behavior so that users are not surprised to see a new table in their catalog that they did not create. I can also do a documentation update if that is what is more appropriate.

sririshindra avatar Aug 29 '22 18:08 sririshindra

Docs are fine, but to me it would make more sense to add a non-default flag to drop_backup (as mentioned above), if we want to do add a feature here. I dont see letting the user decide name for backup is that helpful?

szehon-ho avatar Aug 29 '22 18:08 szehon-ho

Docs are fine, but to me it would make more sense to add a non-default flag to drop_backup (as mentioned above), if we want to do add a feature here. I dont see letting the user decide name for backup is that helpful?

Sounds good. @szehon-ho I will update the PR to add the drop_table flag and make it optional. Thanks.

sririshindra avatar Aug 29 '22 19:08 sririshindra

I understand the thinking behind keeping the backup table after a successful migrate action and having the user decide what to do with it. If somehow this isn't already documented, then @sririshindra you could open a doc PR to document it. I think giving the user an option to remove the backup table automatically upon a successful migrate action is still useful. We can have a boolean parameter, defaulting to false, for removing the backup table, and pass it in the migrate procedure. I'm neutral on having an option for naming the backup table.

Edit: I hadn't updated my browser and didn't see @szehon-ho's comment before posting; I agree with him.

wypoon avatar Aug 29 '22 21:08 wypoon

@szehon-ho Please take another look at this PR when you get a chance. Thanks.

sririshindra avatar Oct 05 '22 19:10 sririshindra

@szehon-ho Gentle ping to review this PR. Thanks.

sririshindra avatar Oct 12 '22 17:10 sririshindra

Thanks for the changes, will merge tomorrow if no further comments.

szehon-ho avatar Oct 18 '22 00:10 szehon-ho

Merged, thanks @sririshindra for the changes

szehon-ho avatar Oct 18 '22 18:10 szehon-ho