rdedup
rdedup copied to clipboard
`prune` command
I have (just) published a crate rdedup-prune which implements a prune
command following the semantics of attic's prune, which I use to maintain my archives. It relies on a specific naming scheme, <prefix>-<timestamp>
, for example "foo-2020-08-04-16-23-00'. It will simply ignore (not remove) names that do not match the format.
This is built on rdedup-lib
, and would be easy to integrate directly into rdedup
. It's fine as a stand alone tool, and I think would only benefit from directly integrating if timestamp metadata were added as part of the repository format. This would allow pruning without a specific name
format. The external implementation meets my need, however.
I'm open for having such thing integrated directly, but why does it have to use a naming scheme? The creation date etc. could be stored in the name file itself, along with any tags like "weekly" "monthly" etc.
I need to double check what we store there already... haven't been paying attention to this project for a long while.
The current naming scheme "restriction" is due to the fact I don't believe the metadata currently contains the timestamp (but maybe I overlooked it?). So, agreed, if that metadata were (or is) included, that would be the superior way to do it (although I already name my archives this way, so the restriction isn't a problem for me personally). I did intentionally avoid e.g. checking creation timestamps on the names .yml
files, etc. as I didn't regard those as reliable.
$ cat foo/0000000000000000-c39fdb79bc3faa16/name/foo.yml
---
digest: d202d7951df2c4b711ca44b4bcc9d7b363fa4252127e058c1a910ec05b6cd038
index_level: 0
Please add any metadata you want that is missing (date mostly?), while backfilling date during deserializatoin with filesystem creation/modification date (for backward-compat), and also maybe tags while at it? https://github.com/dpc/rdedup/blob/master/lib/src/name.rs
If i get it right, we need the naming scheme for backups (correct me if one of the assumptions is wrong):
- rdedup does a key-value store for names
- so every backup does need a new name (anyway)
- parsing file content can have huge cost (think
rclone mount
)
So +1 for merging the prune command.
I'm open for having such thing integrated directly, but why does it have to use a naming scheme? The creation date etc. could be stored in the name file itself, along with any tags like "weekly" "monthly" etc.
That would mean that a prune command must read each and every name file, instead of just doing a ls
. Is that wanted?
That would mean that a prune command must read each and every name file, instead of just doing a ls. Is that wanted?
Doesn't seem terrible, especially that the job of prune is to keep the number of of things limited.
For my usage, reading the metadata for each name is fine (isn't costly), but it does seem some cases that might actually be undesirable. The existing behavior of the prune
command relies on the naming convention, so I would be happy supporting both behaviors. So, for example, we could support flags to the prune
command --metadata
and --timestamp-format
, which are mutually exclusive. The first would read the actual created
field from the metadata, the latter would use the current scheme requiring a timestamp in the name.
Or, by default the command would consult the metadata (as I personally think this is more "reliable"), but there would be a flag timestamp-format
which would override this default and use the naming convention scheme.