sanoid
sanoid copied to clipboard
Make separate syncoid exit code for transient errors
I run syncoid in an A -> B -> C configuration and frequently get failures when running because either the target dataset on B "is already target of a zfs receive process" or because "dataset is busy". These are transient problems because A is syncing to B while B is also syncing to C, so I'd like to distinguish between these and real, actual errors that won't resolve themselves on a subsequent run. Would you accept a PR that sets the non-zero exit codes as follows?
exit code 1 - these might be intentional in some cases (e.g. ignore an empty parent dataset when using -r
):
warn "CRITICAL: no snapshots exist on source $sourcefs, and you asked for --no-sync-snap.\n";
warn "WARN: --no-sync-snap is set, and getnewestsnapshot() could not find any snapshots on source for current dataset. Continuing.\n";
exit code 2 - as noted in the above A -> B -> C scenario, these could be normal during multiple syncs
warn "Cannot sync now: $targetfs is already target of a zfs receive process.\n";
print "WARN: resetting partially receive state\n";
exit code 3 - anything else (these are actual errors)
By separating out exit codes as outlined above, I could only investigate when syncoid exits with 3 since I know the "dataset is busy" and other transient errors will resolve themselves in a later run.
I would love to see this as well.
I implemented this in the above PR for these transient errors:
warn "Cannot sync now: $targetfs is already target of a zfs receive process.\n";
After looking into it more closely, I realized that we cannot distinguish the WARN: resetting partially receive state
state (aka dataset is busy
) from other CRITICAL errors because this error originates in the ZFS command itself (and it doesn't differentiate it with a different exit code); we could search for this string in $stdout
and use a different exit code it if it is found, but I'm concerned that this could mask more severe errors (e.g. if both WARN: resetting partially receive state
and another more severe error were both printed by the ZFS command).
Moreover the following should be solved for empty parent datasets by setting the syncoid:sync
property or using --exclude
, so we should maintain existing behavior for them:
warn "CRITICAL: no snapshots exist on source $sourcefs, and you asked for --no-sync-snap.\n";
warn "WARN: --no-sync-snap is set, and getnewestsnapshot() could not find any snapshots on source for current dataset. Continuing.\n";