quacc icon indicating copy to clipboard operation
quacc copied to clipboard

Rmtree for nfs

Open zulissimeta opened this issue 1 year ago • 6 comments

Summary of Changes

Some filesystems like NFS don't delete files right away, especially if another node might be holding access to them. This manifests as files like .nfs.... in the directory being deleted. rmtree will complain after deleting files and trying to delete the folder.

This PR adapts a utility from another open sources (BSD-3-clause) license repo that deals with this problem for clusters with NFS. Basically, if we hit one of the common errors (resource or device is busy), we wait a few seconds and try again.

I don't know how to reproduce this test without an NFS filesystem, but the existing tests should cover this working as expected for file deletion.

Requirements

Note: If you are an external contributor, you will see a comment from @buildbot-princeton. This is solely for the maintainers.

zulissimeta avatar Aug 19 '24 16:08 zulissimeta

Can one of the admins verify this patch?

buildbot-princeton avatar Aug 19 '24 16:08 buildbot-princeton

Codecov Report

Attention: Patch coverage is 55.55556% with 8 lines in your changes missing coverage. Please review.

Project coverage is 98.16%. Comparing base (bd45301) to head (49f3756). Report is 256 commits behind head on main.

Files with missing lines Patch % Lines
src/quacc/runners/prep.py 55.55% 8 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2434      +/-   ##
==========================================
- Coverage   98.38%   98.16%   -0.23%     
==========================================
  Files          85       85              
  Lines        3477     3495      +18     
==========================================
+ Hits         3421     3431      +10     
- Misses         56       64       +8     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Aug 19 '24 16:08 codecov[bot]

@zulissimeta: Happy to accept something along these lines. First, a quick question though: is it unavoidable that the .nfs files will be present? For instance, this comment suggests that I/O on the node could cause this (e.g. from the logging module). I want to confirm that killing the logger before calling shutil.rmtree isn't sufficient here in terms of a fix.

Andrew-S-Rosen avatar Aug 19 '24 21:08 Andrew-S-Rosen

@zulissimeta: Happy to accept something along these lines. First, a quick question though: is it unavoidable that the .nfs files will be present? For instance, this comment suggests that I/O on the node could cause this (e.g. from the logging module). I want to confirm that killing the logger before calling shutil.rmtree isn't sufficient here in terms of a fix.

I haven't been able to reproduce it consistently. Is the logger writing local files?

zulissimeta avatar Aug 19 '24 22:08 zulissimeta

@zulissimeta: Happy to accept something along these lines. First, a quick question though: is it unavoidable that the .nfs files will be present? For instance, this comment suggests that I/O on the node could cause this (e.g. from the logging module). I want to confirm that killing the logger before calling shutil.rmtree isn't sufficient here in terms of a fix.

I haven't been able to reproduce it consistently. Is the logger writing local files?

I think by default it writes to stderr, if relevant. So, depends on if you are writing stderr to disk. I suppose that's likely not the problem since your stderr would likely not be in the directory being purged. In any case, I just made this toggleable in https://github.com/Quantum-Accelerators/quacc/pull/2436.

Andrew-S-Rosen avatar Aug 20 '24 00:08 Andrew-S-Rosen

@zulissimeta: Out of curiosity, are you running ASE relaxations? I think there might be a problem with the trajectory not closing properly. I have vague recollections of this...

Andrew-S-Rosen avatar Aug 20 '24 01:08 Andrew-S-Rosen