sanoid
sanoid copied to clipboard
Recursive sync stops if a child sync fails with a 'critical error'
We have a large recursive sync, ca 70 sub-filesystems, which is part of a two hop backup, S > A > B.
We are seeing issues when a long running transfer (filesystem 'e') from A > B is causing a hold up to transfers in source > A. File systems alphabetically lower than the long-running transfer file systems are being kept up-to-date on A but when Syncoid gets to 'e' we see:
cannot destroy 'e@autosnap_2021-11-11_08:59:00_hourly': dataset is busy cannot receive incremental stream: dataset is busy 'e' does not have any resumable receive state to abort CRITICAL ERROR: zfs receive -A 'e' failed: 256 at /usr/local/bin/syncoid line 1941.
At this point the recursive sync stops and subsequent file systems do not update. This only appears to happen when 'e' is the source of a send/receive from A to B, if 'e' is receiving from S then Syncoid seems to be OK with skipping over the in progress file system and continuing with the others.