rmlint
rmlint copied to clipboard
Rmlinting large separated repositories
I am sorry is this has easy solution already, I did not find it. I wonder if it is safe (or even possible) to solve this case - maybe with the --replay option
There are two machines A and B far far away from each other. The task is to delete files from A which are already on B (and transfer the others). It is very impractical to copy whole A to B and run rmlint locally. Is there a way for rmlint to scan files on machine B and with the knowledge saved into file and transferred to A do the job?
I'm pretty sure it's implemented already (although a bit clunky) - see #199
There are two machines A and B far far away from each other. The task is to delete files from A which are already on B (and transfer the others). It is very impractical to copy whole A to B and run rmlint locally. Is there a way for rmlint to scan files on machine B and with the knowledge saved into file and transferred to A do the job?
I remember we had a similar discussion in the issue @SeeSpotRun linked. To me this sounds you're asking for something like rsync. Is there anything that speaks against its usage?
Yes obviously ... seems I start to repeat myself :( thanks
Is there an FAQ entry or something that describes this usage?
I guess I need to run rmlint -c json:unique -mk // /mnt/here on machine 1, upload the file to machine 2, and run rmlint -mk /mnt/there // --replay rmlint.json ?
Am I right?
Is there an FAQ entry or something that describes this usage?
Probably worth adding something to https://rmlint.readthedocs.io/en/latest/tutorial.html.
Actually this doesn't (yet) work, because rmlint --replay checks first that:
- the files listed in the .json cache still exist and
- still have the same mtime.
If creating a json cache on one machine and then removing duplicates on a second machine, rmlint on the second machine can't verify whether the files still exist on the first. We would need some sort of option to skip this check, like:
$ rmlint --no-verify-cache [--yes-I-am-really-sure] /path/to/local/files // --replay other_PC_cache.json -km
But it would be dangerous. For example this would potentially generate a script to delete all your files:
$ rmlint /path/to/files --hash-uniques # generates rmint.json
$ rmlint --no-verify-cache /path/to/local/files // --replay rmlint.json -km # everything will match!!
I'm not sure I want to go there.
Maybe a better alternative now that NFS supports xattr would be:
On remote machine:
$ rmlint --xattr -T df --hash-uniques /path/to/files # generates xattr checksums locally on remote machine
On local machine:
$ sudo mount -t nfs remote:/path/to/files /mnt/remote
$ rmlint --xattr -T df --hash-uniques /local/path # generates xattr checksums on local machine
$ rmlint --xattr -km /local/path // /mnt/remote # should find xattr hash matches
So then the only nfs network traffic is to search for and stat the remote files, check their mtime and read their xattr.
I think that's more aligned with the rmlint philosopy.