sanoid icon indicating copy to clipboard operation
sanoid copied to clipboard

Recursive sync stops if a child sync fails with a 'critical error'

Open scratchings opened this issue 3 years ago • 0 comments

We have a large recursive sync, ca 70 sub-filesystems, which is part of a two hop backup, S > A > B.

We are seeing issues when a long running transfer (filesystem 'e') from A > B is causing a hold up to transfers in source > A. File systems alphabetically lower than the long-running transfer file systems are being kept up-to-date on A but when Syncoid gets to 'e' we see:

cannot destroy 'e@autosnap_2021-11-11_08:59:00_hourly': dataset is busy cannot receive incremental stream: dataset is busy 'e' does not have any resumable receive state to abort CRITICAL ERROR: zfs receive -A 'e' failed: 256 at /usr/local/bin/syncoid line 1941.

At this point the recursive sync stops and subsequent file systems do not update. This only appears to happen when 'e' is the source of a send/receive from A to B, if 'e' is receiving from S then Syncoid seems to be OK with skipping over the in progress file system and continuing with the others.

scratchings avatar Nov 12 '21 10:11 scratchings