robinhood icon indicating copy to clipboard operation
robinhood copied to clipboard

robinhood v3: rbh-undelete does not restore empty files or symbolic links

Open geraldhofer opened this issue 8 years ago • 4 comments

If a user accidentally deletes a directory tree it would be desirable to be able to restore that directory tree in exactly the same way as it was deleted. That would also include symbolic links, empty files and empty directories. Currently these objects are not stored in the rm database, so we are loosing the information that was in the robinhood database when we delete these objects and rbh-undelete can not restore these objects.

geraldhofer avatar Feb 19 '17 21:02 geraldhofer

looking at some part of the code and configuration, would guess that undelete can only restore files backed up or archived, files not listed for archival will not appear in SOFT_RM table on remove. this makes some sense to avoid polluting SOFT_RM table with files under TMPFS control

tack0974 avatar Feb 22 '17 17:02 tack0974

The desire is to use Robinhood together with a HSM as prevention for accidental deletions.

In version 2.5 the SOFT_RM table was only used to remove files with lhsm_remove from the backend and the table only kept the minimal data for that task in the RM table. In v3 we do have now all the data to properly restore the file and we have a more elegant solution to restore the stub with rbh-undelete. So that is the first step towards this goal.

It obviously does not make sense to move data that is only in inodes into the HSM, as that would generate empty entries in the backend. So to achieve the full goal of be able to restore accidentally deleted data we need to preserve all the data, including objects that live only in inodes.

So I would propose following changes: Some configuration setting that tells Robinhood to additionally write all deleted files into the SOFT_RM table (not only archived files). That setting would be turned on per default in the lhsm.inc. rbh-undelete need to then handle all other file types that are potentially in the SOFT_RM table.

Only when we run with lhsm, we usually dealing with the lhsm_remove policy. That policy can then clean out all the entries in the SOFT_RM table quickly that the site does not find useful (for example the /lustre/scratch area), but keep the entries for the retention period that should stay longer (for example the /lustere/archive area).

The only issue might be files that are in the SOFT_RM table that are normal files that are not archived. I think it is still helpful for the administrator to have these files in the SOFT_RM table. If rbh-undelete restores a file that is not archived we should (maybe optionally?) skip that file but report on that file, So a administrator knows that these files existed, but have not made it into the archive yet - that is very useful information.

geraldhofer avatar Feb 22 '17 21:02 geraldhofer

As recoverying symlinks and dirs is not related to Lustre/HSM (one may want to do the same without Lustre/HSM), I propose leaving lhsm.inc unchanged. To implement what you need, we just need a simple status manager that accept all entries to SOFT_RM and that can create them back at undelete (mkdir, symlink, ...). Let's call it "rmtracker" (any better name is welcome!). It would come with a specific ".inc" to enable it. It could be used without or together with lhsm policies.

rbh-undelete already allows specifying which status manager to use for the recovery. You could then specify lhsm for archived files, and rmtracker for other entries.

tl-cea avatar Mar 23 '17 15:03 tl-cea

I managed to solve the problem at least for empty files. The lhsm_archive_rules had ignore_fileclass = empty_files; configured. Removing that line does migrate empty files to the copytool, which then makes sure that the entries do end up in the SOFT_RM table and then can be restored. All the other file types (links, pipes, empty dirs) still have that problem. So that bug is still relevant.

geraldhofer avatar Jul 03 '17 02:07 geraldhofer