borg icon indicating copy to clipboard operation
borg copied to clipboard

more specific return code / rc / error codes of the borg process

Open ThomasWaldmann opened this issue 3 years ago • 36 comments
trafficstars

Currently, borg has only a few, rather generic return codes: 0 ok, 1 warning, 2 error, >=128 for termination due to signals.

This could be improved (preferably with a breaking release) by moving some stuff to more specific codes. Everything that has no specific code assigned stays at the current code.

First task is to go through the issue tracker and link to all issues talking about (more specific) error codes from here.

Then it needs some planning: what makes sense, what does not, ...

When planning is finished and there is some consensus, implementation can follow.

ThomasWaldmann avatar Jun 06 '22 09:06 ThomasWaldmann

I'll update this list here with whatever we come up with:

# generic return codes used by borg
0 OK
1 WARNING
2 ERROR
# 3 .. 63 reserved for more specific errors
3 repository is locked
4 connection error  # e.g. ssh returned 255 (other return codes == ?)
5 quota limit reached / no space left in file system
...
# 64 .. 127 reserved for more specific warnings
...
128+N killed by signal N (N == 0..64 for posix signals / realtime signals?)
# 200 .. 255 reserved for informative RCs (like "0 OK", but more specifically telling something)

If a wrapper is unaware of a specific return code, it must treat it like 2 ERROR and notify the user.

An updated wrapper at least needs to implement a range check for the new error / warning / info ranges.

ThomasWaldmann avatar Jun 06 '22 09:06 ThomasWaldmann

I'll update this list with other related issues we find:

  • #61
  • #3446

ThomasWaldmann avatar Jun 06 '22 09:06 ThomasWaldmann

I'd like to suggest an enhancement of error codes.

We should at least reserve or define a broader range of codes that could be implemented in future versions, for example:

1-63: warning (1 meaning "general warning" as it is now) 64-127: error (64 meaning "general error")

(128+N stays unchanged)

If we don't want to implement more detailed return values yet, at least we would change rc 2 to 64. Users would have to change their wrappers from rc 2 to rc >= 64 to recognize errors and could keep the logic for warning.

If we want, we could already define some sub-ranges, e.g. "network errors", "disk related", "quota related" and the like.

fantasya-pbem avatar Jun 06 '22 09:06 fantasya-pbem

No, rc 0 1 2 need to stay as is. There are enough free codes without changing these.

@fantasya-pbem I've updated https://github.com/borgbackup/borg/issues/6756#issuecomment-1147245734 .

ThomasWaldmann avatar Jun 06 '22 09:06 ThomasWaldmann

hmm, if we only reserve space for warnings/errors, we only have 0 for success / information. not sure if that is a problem / limitation in practice.

ThomasWaldmann avatar Jun 06 '22 10:06 ThomasWaldmann

We could reserve the range from 128+64 onwards, maybe 200+ for possible success codes, if we ever have the need for this. Posix signals / real-time signals have max code 32 / 64.

fantasya-pbem avatar Jun 06 '22 10:06 fantasya-pbem

@m3nu @sophie-h ^^^

ThomasWaldmann avatar Jun 06 '22 11:06 ThomasWaldmann

This proposal collects possible error/warning cases that can occur. Some may not be detectable or make no sense with the current implementation.

Specific errors

3 unknown cli option 4 pattern syntax error 5 repository not found 6 archive not found 7 locked repo (taken from #5158) 10 borg init unsupported encryption mode or key algorithm 11+ more init or repo related 15 borg check issues could not be fixed 16+ more check related 20 borg create failed, checkpoint archive left 21+ more create related 30 repository quota reached 31 backup disk full 32 decryption failure 33 decompression failure 34 checksum failure (if applicable) 35 missing chunk 36 segment not found 37 hash not found in index 38 unsupported key 39+ more disk or data structure related 50 remote error due to network error like "broken pipe" (taken from #1424) 51+ more specific SSH errors, needs investigation (taken from #3939)

Specific warnings

64 cannot combine cli options 65 lock removed 66+ more lock related 70 borg check --max-duration stopped 71 borg check --repair fixed issues (taken from #3446) 72 repository check found issues (only useful if repo check issue prevents archive check) 73 archive check found issues (only useful if archive check issue prevents verify) 74 verify check found issues 75+ more check related 80 cache rebuilt occurred / cache out of sync 81+ more cache related 85 borg create could not find root path (inspired by #6708) 86 borg create could not read files 87 file changed during borg create (inspired by #6657) 88+ more create related

Comments welcome!

fantasya-pbem avatar Jun 06 '22 11:06 fantasya-pbem

For errors (fatal stuff that leads to immediate termination) we can deal rather easily with them: raise an Exception, let the exit handler deal with the exception.

For warnings (stuff borg notices while processing, but does not immediately terminate), we could encounter multiple different warnings. How to deal with that? Collect them in a list? If len(warnings) > 1, just give rc 1, otherwise the more specific code? Same for informations.

ThomasWaldmann avatar Jun 06 '22 11:06 ThomasWaldmann

No, rc 0 1 2 need to stay as is.

Thanks for the heads-up! We mostly use 0, 1 and 2 at the moment, but the new ones are for sure interesting to provide better feedback in the UI. Tracking adjustments here: https://github.com/borgbase/vorta/issues/1349

m3nu avatar Jun 06 '22 11:06 m3nu

At first glance, this seems redundant to message ids. Pika is already making use of them. Therefore I don't see the use case yet.

(I would like to add msg ids to a bunch of existing warnings though :laughing:)

sophie-h avatar Jun 08 '22 19:06 sophie-h

Use case is for the shell people that don't want to parse JSON. Thank you for the message ids link, I wasn't aware of that yet. We could associate some of those IDs to return codes.

fantasya-pbem avatar Jun 09 '22 08:06 fantasya-pbem

New errors and warnings tables, return codes and message IDs associated (best guess).

Code Message ID Error
3+ CLI/patterns
Unknown CLI option
Pattern syntax error
8+ Repository
Repository.AlreadyExists Repository {} already exists.
Repository.CheckNeeded Inconsistency detected. Please run “borg check {}”.
Repository.DoesNotExist Repository {} does not exist.
Repository.InsufficientFreeSpaceError Insufficient free space to complete transaction (required: {}, available: {}).
Repository.InvalidRepository {} is not a valid repository. Check repo config.
Repository.AtticRepository Attic repository detected. Please run “borg upgrade {}”.
Repository.ObjectNotFound Object with key {} not found in repository {}.
Repository quota reached
Backup disk full
Missing chunk
Segment not found
Hash not found in index
20+ Archive
Archive.AlreadyExists Archive {} already exists
Archive.DoesNotExist Archive {} does not exist
Archive.IncompatibleFilesystemEncodingError Failed to encode filename “{}” into file system encoding “{}”. Consider configuring the LANG environment variable.
borg create failed, checkpoint archive left
25+ Key/Security
KeyfileInvalidError Invalid key file for repository {} found in {}.
KeyfileMismatchError Mismatch between repository {} and key file {}.
KeyfileNotFoundError No key file for repository {} found in {}.
PassphraseWrong passphrase supplied in BORG_PASSPHRASE is incorrect
PasswordRetriesExceeded exceeded the maximum password retries
RepoKeyNotFoundError No key entry found in the config of repository {}.
UnsupportedManifestError Unsupported manifest envelope. A newer version is required to access this repository.
UnsupportedPayloadError Unsupported payload type {}. A newer version is required to access this repository.
NotABorgKeyFile This file is not a borg key backup, aborting.
RepoIdMismatch This key backup seems to be for a different backup repository, aborting.
UnencryptedRepo Keymanagement not available for unencrypted repositories.
UnknownKeyType Keytype {0} is unknown.
40+ Locking
LockError Failed to acquire the lock {}. (solves #5158)
LockErrorT Failed to acquire the lock {}. (solves #5158)
45+ Caching
Cache.CacheInitAbortedError Cache initialization aborted
Cache.EncryptionMethodMismatch Repository encryption method changed since last access, refusing to continue
Cache.RepositoryAccessAborted Repository access aborted
Cache.RepositoryIDNotUnique Cache is newer than repository - do you have multiple, independently updated repos with same ID?
Cache.RepositoryReplay Cache is newer than repository - this is either an attack or unsafe (multiple repos with same ID)
50+ Check
borg check issue could not be fixed
55+ Misc
Buffer.MemoryLimitExceeded Requested buffer size {} is above the limit of {}.
ExtensionModuleError The Borg binary extension modules do not seem to be properly installed
IntegrityError Data integrity error: {}
NoManifestError Repository has no manifest.
PlaceholderError Formatting Error: “{}”.format({}): {}({})
Decryption failure
Decompression failure
64+ Network/Remote
ConnectionClosed Connection closed by remote host (solves #1424)
InvalidRPCMethod RPC method {} is not valid
PathNotAllowed Repository path not allowed
RemoteRepository.RPCServerOutdated Borg server is too old for {}. Required version {}
UnexpectedRPCDataFormatFromClient Borg {}: Got unexpected RPC data format from client.
UnexpectedRPCDataFormatFromServer Got unexpected RPC data format from server: {}
80+ More specific SSH errors, needs investigation (taken from #3939)
Code Message ID Warning
100+ CLI/patterns
Cannot combine cli options
103+ Repository
Repository.ObjectNotFound Object with key {} not found in repository {}.
106+ Locking
Lock removed
110+ Caching
Cache rebuilt occurred / cache out of sync
115+ Check
borg check --max-duration stopped
borg check --repair fixed issues (taken from #3446)
Repository check found issues (only useful if repo check issue prevents archive check)
Archive check found issues (only useful if archive check issue prevents verify)
Verify check found issues
borg create could not find root path (inspired by #6708)
borg create could not read files
File changed during borg create (inspired by #6657)

fantasya-pbem avatar Jun 14 '22 09:06 fantasya-pbem

Is there some systematic in the Code (guess it means the return code, rc) column (except that >= 64 is a warning)?

Where do the numbers seen there come from?

ThomasWaldmann avatar Jun 17 '22 16:06 ThomasWaldmann

Not yet. Now, after having integrated the message IDs, the codes have to be distributed more evenly. But first I want to check these IDs - are they all really errors that abort Borg or are some of them progress messages? Second, after we have found that the suggested groups make sense and we have defined all possible (and recognizable) error / warning conditions, we can distribute the return codes appropriately.

fantasya-pbem avatar Jun 17 '22 16:06 fantasya-pbem

I found that most message IDs are just raised errors, but at least Repository.ObjectNotFound is used in a warning context, too (like in archive.delete() where this error is catched and just ignored). After reordering the groups there seem to exist much more errors than warnings, so I suggest to have the warnings starting with return code 100 (or at least 80) instead of 64.

fantasya-pbem avatar Jun 19 '22 11:06 fantasya-pbem

It's not unusual to have different warnings like "File changed during borg create" or "File not found" during one create. I'm not sure how return codes would handle that.

Overall, I'm a bit scared of the shell script that will handle all of these errors. Since the number of exit codes is quite limited I don't really see that this concept scales. I would suggest writing down the use case more specifically to see how this could actually work.

sophie-h avatar Jun 29 '22 19:06 sophie-h

Would be good to have this in the beta releases, which will likely come next after alpha4 (released today).

ThomasWaldmann avatar Jul 17 '22 14:07 ThomasWaldmann

@fantasya-pbem are you still working on this?

ThomasWaldmann avatar Aug 22 '22 16:08 ThomasWaldmann

No, and nobody besides me seems to have interest, too. My "work" has been collecting possible error codes,the real work had to be done by developers of Borg. Feel free to close this issue.

fantasya-pbem avatar Aug 22 '22 16:08 fantasya-pbem

@fantasya-pbem Well, my impression was that there definitely is some interest (see also misc. other related tickets) and I was primarily waiting for you to finish your work, see https://github.com/borgbackup/borg/issues/6756#issuecomment-1159068059 .

I guess I or someone else could then implement this, but having a good plan first is essential. In case we get some funds on the bountysource org account, there could be even one or two bounties.

Personally I will be rather busy with #6987 for a while, so help here is appreciated!

ThomasWaldmann avatar Aug 23 '22 16:08 ThomasWaldmann

See https://github.com/borgbackup/borg/issues/6756#issuecomment-1159689734. Also: It seems that warning return codes are less useful as there is likely more than one condition to be represented. Error codes make more sense here I guess, maybe we could concentrate on errors first. Then, the next task would be to check the code where these errors can occur: Is there an exception thrown / is Borg aborted? If so, the respective error code could be returned directly.

fantasya-pbem avatar Aug 23 '22 16:08 fantasya-pbem

Yes, we can just leave warnings away for now to simplify the task.

So, only error return codes for conditions that lead to a rather immediate termination. Usually this means either some exception that gets handled by the toplevel handler in borg.archiver or a command handler returning with some error code.

ThomasWaldmann avatar Aug 23 '22 19:08 ThomasWaldmann

Just wanted to ask if somebody wants to either work on this or put a bounty on it?

Due to compatibility reasons I guess the best point to introduce such a change would be with borg 2.0 (which is a breaking change anyway).

But for that to happen, work on this would have to be done rather soon or it won't be in 2.0.

ThomasWaldmann avatar Jan 29 '23 17:01 ThomasWaldmann

I'ld recommend doing this in this order:

  • updating the docs about ERROR return codes (some planning was done above, review/improve/discuss it), PR
  • adjusting the code to match the docs, PR
  • think more about how to deal with WARNING return codes (esp. considering there might be many warnings in a single run)
  • update docs about WARNING return codes, PR
  • adjust the code to match the docs, PR

ThomasWaldmann avatar Apr 03 '23 12:04 ThomasWaldmann

ERRORs are rather simple, WARNINGs more difficult / unclear yet how to best deal with.

ThomasWaldmann avatar Apr 03 '23 12:04 ThomasWaldmann

ERRORs are rather simple

I think you have to walk through the relevant parts of the code base and specify the return code for a the different problems resulting in an error. Or is a their a way to easily map existing exceptions and message ids to return codes in one place?

real-yfprojects avatar Apr 03 '23 12:04 real-yfprojects

@real-yfprojects yes. it's not necessarily an exception, can be also just an if+ return.

ThomasWaldmann avatar Apr 03 '23 12:04 ThomasWaldmann

Are there plans to have a own RC for the borg create message "file changed while we backed it up", which comes now up with RC1?

mai-schlau avatar Oct 23 '23 18:10 mai-schlau

@mai-schlau that is a warning (not: error) message and considering there might be multiple and different warnings (plus potentially even some real error), this is a more difficult case than the "real error" return codes and IIRC we did not discuss yet how to handle that.

ThomasWaldmann avatar Oct 23 '23 18:10 ThomasWaldmann

As we have multiple systems, which produce this "file changed while we backed it up" and which still have to be backuped it would be very helpful if these messages would produce a different RC than 1. So it would be easier and more handy to distinguish this occurrence from others like 'no such path' which in our environment requests action.

But maybe, and this is only a simple suggestion, something like RC 1-1 for the "changed while" and a simple 1 for the rest would be an approach. Further RC1s like 'no such path' could also come up with RC 1-2 and maybe the list could be filled till 1-9.

Especially the RC1-1 for the "file changed while we backed it up", would make the messages and our monitoring a lot clearer ...

mai-schlau avatar Oct 25 '23 14:10 mai-schlau

idea about how to deal with warnings

As there can be multiple warnings in a borg run, it is difficult to map all this information to a few bits in a 8bit return code.

But what we could do is:

  • if there is an error, set error to its return code (> 0).
  • collect all warnings in a list.
  • to start simple, the elements would be numerical return codes.
  • later, if needed, the elements could also be (rc, warning message) tuples.
  • compute borg process rc like this:
def compute_rc(error, warnings):
    if error > 0:
        # there was a fatal error, return its error code
        return error
    if not warnings:
        # great: no error and also no warnings
        return 0
    rcs = set(warnings)
    if len(rcs) == 1:
        # easy: there was only one kind of warning, so we can be specific
        return rcs[0]
    # there were different kinds of warnings
    return 1  # generic warning rc, user has to look into the logs

We could think about whether it makes sense to have the same rc for different kinds of similar warnings, so that the specific warning rc case in that computation is used more frequently than the fallback to rc 1 ("find it out yourself!")

ThomasWaldmann avatar Nov 09 '23 01:11 ThomasWaldmann