Rmtree for nfs
Summary of Changes
Some filesystems like NFS don't delete files right away, especially if another node might be holding access to them. This manifests as files like .nfs.... in the directory being deleted. rmtree will complain after deleting files and trying to delete the folder.
This PR adapts a utility from another open sources (BSD-3-clause) license repo that deals with this problem for clusters with NFS. Basically, if we hit one of the common errors (resource or device is busy), we wait a few seconds and try again.
I don't know how to reproduce this test without an NFS filesystem, but the existing tests should cover this working as expected for file deletion.
Requirements
- [x] My PR is focused on a single feature addition or bugfix.
- [x] My PR has relevant, comprehensive unit tests.
- [x] My PR is on a custom branch (i.e. is not named
main).
Note: If you are an external contributor, you will see a comment from @buildbot-princeton. This is solely for the maintainers.
Can one of the admins verify this patch?
Codecov Report
Attention: Patch coverage is 55.55556% with 8 lines in your changes missing coverage. Please review.
Project coverage is 98.16%. Comparing base (
bd45301) to head (49f3756). Report is 256 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/quacc/runners/prep.py | 55.55% | 8 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #2434 +/- ##
==========================================
- Coverage 98.38% 98.16% -0.23%
==========================================
Files 85 85
Lines 3477 3495 +18
==========================================
+ Hits 3421 3431 +10
- Misses 56 64 +8
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@zulissimeta: Happy to accept something along these lines. First, a quick question though: is it unavoidable that the .nfs files will be present? For instance, this comment suggests that I/O on the node could cause this (e.g. from the logging module). I want to confirm that killing the logger before calling shutil.rmtree isn't sufficient here in terms of a fix.
@zulissimeta: Happy to accept something along these lines. First, a quick question though: is it unavoidable that the .nfs files will be present? For instance, this comment suggests that I/O on the node could cause this (e.g. from the
loggingmodule). I want to confirm that killing the logger before callingshutil.rmtreeisn't sufficient here in terms of a fix.
I haven't been able to reproduce it consistently. Is the logger writing local files?
@zulissimeta: Happy to accept something along these lines. First, a quick question though: is it unavoidable that the .nfs files will be present? For instance, this comment suggests that I/O on the node could cause this (e.g. from the
loggingmodule). I want to confirm that killing the logger before callingshutil.rmtreeisn't sufficient here in terms of a fix.I haven't been able to reproduce it consistently. Is the logger writing local files?
I think by default it writes to stderr, if relevant. So, depends on if you are writing stderr to disk. I suppose that's likely not the problem since your stderr would likely not be in the directory being purged. In any case, I just made this toggleable in https://github.com/Quantum-Accelerators/quacc/pull/2436.
@zulissimeta: Out of curiosity, are you running ASE relaxations? I think there might be a problem with the trajectory not closing properly. I have vague recollections of this...