maestral icon indicating copy to clipboard operation
maestral copied to clipboard

Bulk deletion of files when underlying storage disappears

Open techman83 opened this issue 4 years ago • 3 comments

Describe the bug This might be better as a feature request, but when syncing a large amount of deletes it may be prudent to check that the storage can be accessed. Though the storage disappearing from underneath the client is probably rather edge case and not sure this could even be handled gracefully!

2021-10-17 06:03:27 sync INFO: Deleting 49773/49779...
2021-10-17 06:03:29 manager INFO: Up to date
2021-10-17 06:03:33 manager INFO: Syncing...
2021-10-17 06:03:33 sync INFO: Fetching remote changes
2021-10-17 06:03:35 sync INFO: Indexing 993...
2021-10-17 06:03:35 sync INFO: Applying deletions...
2021-10-17 06:03:35 sync INFO: Deleting 1/1...
2021-10-17 06:03:37 sync INFO: Deleting 8/337...
2021-10-17 06:03:39 sync INFO: Deleting 235/337...
2021-10-17 06:03:41 sync INFO: Deleting 110/345...
2021-10-17 06:03:43 sync INFO: Deleting 337/345...
2021-10-17 06:03:45 manager ERROR: Cannot create cache directory: Transport endpoint is not connected
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/maestral/sync.py", line 1144, in _new_tmp_file
    with NamedTemporaryFile(dir=self.file_cache_path, delete=False) as f:
  File "/usr/lib/python3.6/tempfile.py", line 690, in NamedTemporaryFile
    (fd, name) = _mkstemp_inner(dir, prefix, suffix, flags, output_type)
  File "/usr/lib/python3.6/tempfile.py", line 401, in _mkstemp_inner
    fd = _os.open(file, flags, 0o600)
OSError: [Errno 107] Transport endpoint is not connected: '/data/Dropbox/Dropbox/.maestral.cache/tmp324zkcdb'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/maestral/manager.py", line 791, in _handle_sync_thread_errors
    yield
  File "/usr/local/lib/python3.6/dist-packages/maestral/manager.py", line 636, in download_worker
    self.sync.download_sync_cycle(client)
  File "/usr/local/lib/python3.6/dist-packages/maestral/sync.py", line 2836, in download_sync_cycle
    downloaded = self.apply_remote_changes(changes)
  File "/usr/local/lib/python3.6/dist-packages/maestral/sync.py", line 3004, in apply_remote_changes
    for n, r in enumerate(res):
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 586, in result_iterator
    yield fs.pop().result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.6/dist-packages/maestral/sync.py", line 3328, in _create_local_entry
    res = self._on_remote_file(event, client)
  File "/usr/local/lib/python3.6/dist-packages/maestral/sync.py", line 3384, in _on_remote_file
    tmp_fname = self._new_tmp_file()
  File "/usr/local/lib/python3.6/dist-packages/maestral/sync.py", line 1155, in _new_tmp_file
    "Please check if you have write permissions for "
maestral.errors.CacheDirError: Cannot create cache directory: Transport endpoint is not connected. Please check if you have write permissions for /data/Dropbox/Dropbox/.maestral.cache.
2021-10-17 06:03:45 manager INFO: Shutting down threads...
2021-10-17 06:03:45 sync INFO: Sync aborted
2021-10-17 06:03:45 manager INFO: Paused

To Reproduce

  1. Maestral pointed Remote storage
  2. Disconnect storage from underneath the OS

Expected behaviour Maestral to fail gracefully if it can't read/write to the Dropbox location that is configured

System:

  • Maestral version: 1.5.0
  • Python version: 3.6.9
  • OS: Ubuntu 18.04.6 LTS
  • Desktop environment: N/A - headless
  • PyQt version (for Linux GUI): N/A

Additional context These are glusterfs bricks mounted via glusterfs 3.13.2. It's all pretty low performance and sketchy (couple of BananaPi boards, built when not many SBCs had Sata) and a x86 box based on an AMD Jaguar. It has been working really well, albeit a little slow due to the storage bottleneck since the early releases of maestral after dropbox decided to end support for non ext4 drives.

The only trouble is the rewind feature is either currently broken or can't deal with a changeset with 60,000+ files, so I'm waiting for support to roll it back (though I keep multiple offline external backups of the files as well).

Additional Additional context Dropbox were pretty responsive and rolled back things to just before it happened, so no harm done. But it has run like this for a very long time, including with the official dropbox client before it got kneecapped. Now it's possible I've gotten lucky, but I know for sure at least a few times the storage layer has gone wonky and the client just bailed out. I just re-mounted and restarted the services. This is the first time I've encountered anything beyond a sync conflict and it was pretty soon after upgrading to 1.5.0.

techman83 avatar Oct 19 '21 08:10 techman83

After the rollback I got around to turning maestral back on. Before that I dug into the Filesystem and the gluster host it was mapped against was quite wonky. So the FS was kinda mapped, but kinda not.

2021-10-21 10:11:10 sync INFO: Up to date
2021-10-21 10:11:10 sync INFO: Indexing local changes...
2021-10-21 10:29:11 sync INFO: Syncing ↑ 1/1
2021-10-21 10:29:11 manager INFO: Up to date
2021-10-21 10:29:12 manager INFO: Syncing...
2021-10-21 10:29:14 sync INFO: Syncing ↑ 1/2
2021-10-21 10:29:15 manager INFO: Up to date
2021-10-21 10:29:16 manager INFO: Syncing...
2021-10-21 10:29:16 sync INFO: Fetching remote changes
2021-10-21 10:29:16 sync INFO: Indexing 1...
2021-10-21 10:29:16 sync INFO: Syncing ↓ 1/1
2021-10-21 10:29:17 sync INFO: Up to date
2021-10-21 10:29:17 manager INFO: Up to date

If this is well in "Won't Fix" territory, I don't mind if it's closed. If it happens again I might dive into the deletion code and see if there is a way to guard against it :slightly_smiling_face:

techman83 avatar Oct 21 '21 02:10 techman83

Just to confirm that I understand the issue: The storage got disconnected and Maestral handled this as if the local files were deleted instead of raising an error about the local Dropbox folder itself no longer being accessible?

If this is the case, could you show the logs from the point just before the deletions occured? In principle, there are two circumstances in which local files will be flagged as deleted:

  1. We receive inotify events of file deletions. If this set of events includes the Dropbox folder itself, Maestral will bail out early with an error message before deleting anything on the server.
  2. We perform startup indexing and notice that files or folders which are present in our database no longer exist on the drive. We assume that those were deleted by the user while Maestral was not running. If the Dropbox folder itself no longer exists, we again bail out with an error message.

The first case should apply to you since an inaccessible file system should not send any inotify messages. The second case currently will flag false deletions if the local Dropbox folder becomes inaccessible during the index process. This needs to be fixed...

samschott avatar Oct 21 '21 14:10 samschott

It was a bit curlier than that. The error I noticed in a speedtest I generated was /bin/bash: /data/Dropbox/Dropbox/speedtest.csv: Transport endpoint is not connected. But checking, I could list the files and interact with the file system. Nothing int the gluster logs on either side revealed anything. Remounting didn't help either. I restarted the service on the host it was connected to and it came good.

It's the second case that occurred, either systemd restarted the process or on reconnecting it re-indexed.

2021-10-16 23:35:47 manager INFO: Up to date
2021-10-17 00:13:44 manager INFO: Connection lost
2021-10-17 00:13:44 manager INFO: Shutting down threads...
2021-10-17 00:13:44 sync INFO: Sync aborted
2021-10-17 00:13:44 manager INFO: Paused
2021-10-17 00:13:44 manager INFO: Connecting...
2021-10-17 00:21:03 sync INFO: Fetching remote changes
2021-10-17 00:21:05 sync INFO: Indexing local changes...
2021-10-17 00:51:44 sync INFO: Uploading deletions...
2021-10-17 00:52:01 sync INFO: Deleting 1/49779...

Ok. So in trying to explain this, I know what happened and it's a consequence of how glusterfs works and a poor configuration choice when I set it up 5 years ago.

stor2:/backups                       3.7T  3.6T  132G  97% /data/backups <- replicated
stor2:/Dropbox                       7.3T  7.0T  289G  97% /data/Dropbox <- striped

In my infinite wisdom, I configured it as a striped volume. I think the the volume is supposed to go offline (which is why it couldn't be written to, but visible and mountable) if a peer disappears, but half the files would be indeed be missing.

So potentially checking a path could be written to before perming a big delete might help in this edge case here. But it's worth noting, striped is now a deprecated brick configuration and looks to have been replaced with dispersed.

Dispersed - Dispersed volumes are based on erasure codes, providing space-efficient protection against disk or server failures. It stores an encoded fragment of the original file to each brick in a way that only a subset of the fragments is needed to recover the original file. The number of bricks that can be missing without losing access to data is configured by the administrator on volume creation time. Striped [Deprecated] – Striped volumes stripes data across bricks in the volume. For best results, you should use striped volumes only in high concurrency environments accessing very large files.

techman83 avatar Oct 22 '21 00:10 techman83

Closing since this is indeed a very niche issue and in the Won't Fix territory for me.

samschott avatar May 16 '23 20:05 samschott