rdedup icon indicating copy to clipboard operation
rdedup copied to clipboard

`prune` command

Open pfernie opened this issue 3 years ago • 7 comments

I have (just) published a crate rdedup-prune which implements a prune command following the semantics of attic's prune, which I use to maintain my archives. It relies on a specific naming scheme, <prefix>-<timestamp>, for example "foo-2020-08-04-16-23-00'. It will simply ignore (not remove) names that do not match the format.

This is built on rdedup-lib, and would be easy to integrate directly into rdedup. It's fine as a stand alone tool, and I think would only benefit from directly integrating if timestamp metadata were added as part of the repository format. This would allow pruning without a specific name format. The external implementation meets my need, however.

pfernie avatar Aug 04 '20 22:08 pfernie

I'm open for having such thing integrated directly, but why does it have to use a naming scheme? The creation date etc. could be stored in the name file itself, along with any tags like "weekly" "monthly" etc.

I need to double check what we store there already... haven't been paying attention to this project for a long while.

dpc avatar Aug 04 '20 22:08 dpc

The current naming scheme "restriction" is due to the fact I don't believe the metadata currently contains the timestamp (but maybe I overlooked it?). So, agreed, if that metadata were (or is) included, that would be the superior way to do it (although I already name my archives this way, so the restriction isn't a problem for me personally). I did intentionally avoid e.g. checking creation timestamps on the names .yml files, etc. as I didn't regard those as reliable.

pfernie avatar Aug 04 '20 23:08 pfernie

$ cat foo/0000000000000000-c39fdb79bc3faa16/name/foo.yml 
---
digest: d202d7951df2c4b711ca44b4bcc9d7b363fa4252127e058c1a910ec05b6cd038
index_level: 0

Please add any metadata you want that is missing (date mostly?), while backfilling date during deserializatoin with filesystem creation/modification date (for backward-compat), and also maybe tags while at it? https://github.com/dpc/rdedup/blob/master/lib/src/name.rs

dpc avatar Aug 04 '20 23:08 dpc

If i get it right, we need the naming scheme for backups (correct me if one of the assumptions is wrong):

  • rdedup does a key-value store for names
  • so every backup does need a new name (anyway)
  • parsing file content can have huge cost (think rclone mount)

So +1 for merging the prune command.

geek-merlin avatar Oct 18 '20 16:10 geek-merlin

I'm open for having such thing integrated directly, but why does it have to use a naming scheme? The creation date etc. could be stored in the name file itself, along with any tags like "weekly" "monthly" etc.

That would mean that a prune command must read each and every name file, instead of just doing a ls. Is that wanted?

geek-merlin avatar Jan 12 '21 21:01 geek-merlin

That would mean that a prune command must read each and every name file, instead of just doing a ls. Is that wanted?

Doesn't seem terrible, especially that the job of prune is to keep the number of of things limited.

dpc avatar Jan 13 '21 00:01 dpc

For my usage, reading the metadata for each name is fine (isn't costly), but it does seem some cases that might actually be undesirable. The existing behavior of the prune command relies on the naming convention, so I would be happy supporting both behaviors. So, for example, we could support flags to the prune command --metadata and --timestamp-format, which are mutually exclusive. The first would read the actual created field from the metadata, the latter would use the current scheme requiring a timestamp in the name.

Or, by default the command would consult the metadata (as I personally think this is more "reliable"), but there would be a flag timestamp-format which would override this default and use the naming convention scheme.

pfernie avatar Jan 16 '21 19:01 pfernie