docker-volume-backup icon indicating copy to clipboard operation
docker-volume-backup copied to clipboard

Only backup if volume content has changed

Open LuukVerhagen opened this issue 1 month ago • 6 comments

Is your feature request related to a problem? Please describe. I have docker volumes which can easily be unchanged for weeks if not months, so it would be inefficient and mostly a waste of my money to overwrite the backups in AWS S3 if the content hasn't changed.

Describe the solution you'd like I have no solution I like, but some ideas are: using a non-encrypting hashing functionality to check for file change or you could compare the docker volume size.

Describe alternatives you've considered I looked into using custom command, but as far as I know it doesn't support halting the executing of the backup.

Additional context I do not use the internal cron-job functionality, but instead use an external script which attaches and detaches the docker container to all the existing volumes and uploads all the archives to AWS S3

LuukVerhagen avatar Dec 03 '25 14:12 LuukVerhagen

This has come up in one form or another (#94) already. I do think it makes sense as a feature, but it adds something to this tool that hasn't been there before: statefulness. How would you envision storing the checksum of an archive? In a sibling file next to the archive?

m90 avatar Dec 03 '25 18:12 m90

You could either store the checksum in the volume itself or in a dedicated offen/docker-volume-backup volume, the latter making more sense I think.

LuukVerhagen avatar Dec 03 '25 18:12 LuukVerhagen

If I store the checksum in the volume itself, it might taint the result of the checksum operation, no?

Mounting a dedicated "data" volume would definitely work, although I currently have mixed feelings about starting to become stateful. I'll think on this for a bit. If you have any other ideas, please feel free to drop them here.

m90 avatar Dec 03 '25 18:12 m90

I can't think about ways which doesn't bring some form of statefullnes into the game, except maybe comparing the size of the last result to the current result

LuukVerhagen avatar Dec 04 '25 11:12 LuukVerhagen

When I say "statefulness" I am referring to local state, sorry if that wasn't clear. The backups themselves will always be stateful, which I guess is what we want :)

This is how I think I would implement such a feature:

  • There is a new BACKUP_USE_CHECKSUMS settings that defaults to false
  • When set to true, the following steps will happen in addition to the current behavior:
    • Determine the filename the archive will use this time (i.e. interpolate time and other variables)
    • Calculate a checksum for all (untared, uncompressed) contents that would be backed up. This uses the same filtering rules as archive creation
    • Check whether a $DESIRED_FILENAME.checksum file exists on the storage backend (this will probably also be configurable, but that doesn't matter right now)
    • In case such a file exists and its contents match the previously calculated checksum, print a message about how there is nothing to do and exit early
    • In case the checksum file does not exist or its contents do not match the local checksum, proceed as before
    • Once the archive has been uploaded successfully, create or update a sibling checksum file containing the checksum that matches the archive
    • When pruning archives, check if a checksum file exists and also delete these if needed

Theoretically, it would also be possible to use a different approach for each storage backend (e.g. storing the checksum in S3 Metadata instead of a file), but I feel this makes things needlessly complicated. Another option would be using a checksums.txt file as produced by md5sum or similar, but then I would be worried this makes pruning more complicated than it needs to be.

Would such a setup cover your use case?

m90 avatar Dec 04 '25 13:12 m90

Yess, this approach would cover my use case

LuukVerhagen avatar Dec 09 '25 07:12 LuukVerhagen