rdfind icon indicating copy to clipboard operation
rdfind copied to clipboard

Restart rdfind job on interuption.

Open NathanielEvry opened this issue 5 years ago • 4 comments

Enable rdfind to be re-initiated on large jobs.

Right now, trying to scan an 8TB cold storage drive with Linux OS backups. Taking literal days and I've had to kill the job because there's no way to tell how it's making progress.

NathanielEvry avatar Apr 14 '20 17:04 NathanielEvry

This is an interesting idea. I thought about the idea of using an sqlite database as intermediate storage both for handling low ram situations and for debugging. But it would also be useful in a scenario like this. The downside is the dependency on sqlite, plus the complexity it brings in (being able to resume is not trivial at all).

There is an existing issue #30 regarding adding a progress bar, would that help you?

Also, did you know that you can use lsof to see which files are opened? It is easy to see that it actually reads files using that tool.

pauldreik avatar Apr 14 '20 17:04 pauldreik

I'm also interested in this, but not for the purpose of the progress bar (however progress bar will be useful too). I would like to have a possibility to restart rdfind continuously and don't re-calc hashes and other meta info if the file was not changed (has the same mtime/ctime, size, first/last byte etc since the last run), just to speed-up further runs after the first one. This will be a significant time-saver, and increase usefulness of the tool. Please consider to increase the priority of this feature. Thank you!

cub-uanic avatar Aug 13 '20 16:08 cub-uanic

I agree about this idea, I realize this is about 3.5 years old, any more thoughts? I have a large 22T zfs digital hoard that I am trying to manage and de-duplicate. Being able to store portions in a DB and run it on specific sub-directories would be a huge win. Also any thought of parallelization?

I could pitch in with some direction, I have been reviewing the code.

saulwold avatar Jan 21 '24 23:01 saulwold

You might take a look at issues #31 and #100 related to caching.

There was some discussion of parallelization in issue #113

fire-eggs avatar Jan 22 '24 14:01 fire-eggs