dbx icon indicating copy to clipboard operation
dbx copied to clipboard

dbx sync can in theory experience unrecoverable errors if changes occur while syncing

Open matthayes opened this issue 3 years ago • 3 comments

Expected Behavior

Changing the state of the files/directories while dbx sync commands are running should never result in the command failing.

Current Behavior

In theory, the following situation seems possible:

  • Start running a dbx sync command on a root directory.
  • Copy a subdir with many files into this root directory, which will cause the sync to start copying.
  • While the sync runs, delete the subdir and all its files.
  • Sync will likely fail when it encounters a file it intended to sync that no longer exists.

I've heard reports of occasional 404 or No such file or directory errors when working in a git repo and I suspect this is the cause.

Steps to Reproduce (for bugs)

I have not tried to reproduce yet, but I suspect the above steps will cause the tool to fail, given what I know of the code.

Context

This is likely a fairly simple fix. We can simply capture errors such as these when calling incremental_sync and just rerun the incremental_sync again. This will cause it to get a new directory snapshot and compare it against the previous snapshot. Snapshots are only saved after a successful sync, so worst case we repeat some operations.

Your Environment

  • dbx version used:
  • Databricks Runtime version:

matthayes avatar Jun 08 '22 05:06 matthayes

hi @matthayes , any plans to cover this issue? I personally cannot reproduce it in my environment, and I've never seen any issues about it.

renardeinside avatar Aug 25 '22 22:08 renardeinside

Yes I plan to investigate and fix this. However I think it's pretty rare that this might occur. I've only heard about this happening a couple times even with heavy usage.

matthayes avatar Aug 27 '22 04:08 matthayes

Note that if this did occur, then the resolution would just be to start the dbx sync process again. It would resume from where it left off. The fix would simply be to capture the exception and do the same thing by restarting the sync from its last state.

matthayes avatar Aug 27 '22 04:08 matthayes