Recursively delete directories on Windows without race conditions
Niall Douglas' "Racing the File System" at CppCon actually outlines a deletion algorithm that works correctly on Windows. It does not require retries.
The gist of it:
- Deletion is effectively asynchronous on Windows.
- If deleting recursively, you want to rename the item you're deleting out of the directory you're deleting first, then delete it.
This provides an alternative solution to the issue described in #96 and #108.
Alternative
It seems there's yet another way to do this using Windows 10 V1607's POSIX-compatible API:
SetFileInformationByHandle(FILE_DISPOSITION_INFO_EX { FILE_DISPOSITION_DELETE | FILE_DISPOSITION_POSIX_SEMANTICS })
But this is limited to NTFS.
https://stackoverflow.com/a/50234974
This is described on slide 11 of the talk.
That makes very little sense. Slide 19 indicates that you can't rename the file if any process has a read lock. Which is exactly the problem.
The rename method only allows you to prevent new processes to acquire read locks on the file if they are trying to open the file by name since renames are atomic vs delete files which is just marking for deletion.
It explicitly says you still have to retry. It also doesn't take into account services that lock any new file like an AV. After you rename that file can still get locked. So I don't understand how that method "solves" it.
Sorry, I think I may have conflated two issues:
-
Sometimes another program like the indexer or AV might keep a file / directory locked for a brief amount of time, preventing deletion. This approach does not address this problem.
-
Deletion is asynchronous, so trying to delete a parent directory after "deleting" its children may fail because the children are still pending deletion. This issue is intended to address this problem.
I'd say this algorithm makes it less atomic and makes things more complicated in case of partial failure and exceptions. You'll end up with stray directories, no? Retry is a viable solution for both problems afais.
Retry is a viable solution for both problems afais.
The inclusion of retrying functions in base is still contentious (see here).
I personally don't understand why it's being added to base and not a uncoupled library like directory. Adding it to directory makes more sense since it can then be used with any version of the compiler.
Adding it to base restricts it to the bleeding edge for no reason. If this is really so contentious I'll just add it to Win32 and move on. But i don't want to split functionality across libraries unless needing to.
I think we're over engineering this and over thinking a simple solution. But that's just my two cents.
If this is really so contentious I'll just add it to Win32 and move on.
It appears that this would be the least disruptive solution. This is a Windows-specific problem, and as discussed in the GHC tracker, retrying is kind of problem-specific.
You'll end up with stray directories, no?
It looks like you can use FILE_FLAG_DELETE_ON_CLOSE to avoid stray items. The Rust implementation below seems like a good reference:
https://docs.rs/crate/remove_dir_all/0.5.3/source/src/fs.rs
On another note, is there any reason SHFileOperation({wFunc: FO_DELETE}) can't be used? That would lead to the simplest solution, avoiding a lot of the complexity.
The only counterargument I've seen is that the API does not work on handles https://github.com/rust-lang/rust/issues/29497#issuecomment-187353954, which does not apply to directory.
You'll end up with stray directories, no?
It looks like you can use
FILE_FLAG_DELETE_ON_CLOSEto avoid stray items. The Rust implementation below seems like a good reference:https://docs.rs/crate/remove_dir_all/0.5.3/source/src/fs.rs
FILE_FLAG_DELETE_ON_CLOSE introduces a race condition because unlike Unix the file entry remains but is marked unusable vs the inode the only thing remaining. Your DeleteFile operation will complete successfully but the filename is unusable until the last handle is closed.
This means if the programmer tries so open a new file with the same name/path they will get an access denied on CreateFile. This quite literally just moves the problem.
On another note, is there any reason
SHFileOperation({wFunc: FO_DELETE})can't be used? That would lead to the simplest solution, avoiding a lot of the complexity.The only counterargument I've seen is that the API does not work on handles rust-lang/rust#29497 (comment), which does not apply to
directory.
It is not entirely clear to me that these shell API functions support NT namespace paths. https://docs.microsoft.com/en-us/windows/win32/api/shellapi/ns-shellapi-shfileopstructw suspiciously mentions the buffer must be large enough to hold MAX_PATH. If they don't then it's a major regression and won't work for Haskell programs.
It is also not entirely clear that they actually solve the problem. The function has a documented return code for when a sharing violation occurred during the operation. In fact the recursive delete that it provides can fail for some files so it can return partial failures. Now you just won't know which files failed.
To clarify, the scope of the current issue is:
-
Make sure that recursively deletion does not have internal race conditions -- it should not "trip over itself" as it ascends the directory hierarchy.
-
Allow deletion to proceed even if files are open by other processes as long as they enable
FILE_SHARE_DELETE.
To maintain current behavior, support for long paths would be a requirement.
On the other hand, the following are out of scope of this issue:
-
The asynchronicity of deletion on Windows: it is not possible to offer a truly synchronous deletion API that guarantees termination except in limited circumstances, e.g. NTFS on Win 10 V1607.
That being said, on can reduce the impact on subsequent operations that expect the directory to be gone by renaming trick. Cygwin's implementation goes a step further by using the Recycle Bin for this purpose.
-
Lack of knowledge about the state of the target tree after a partially failed deletion: The current API does not provide any information on that and there are no plans to change that.
https://bugs.python.org/issue40143#msg365697 noted that DeleteFileW on Windows 10 RS1 uses FILE_DISPOSITION_POSIX_SEMANTICS by default on NTFS systems. RemoveDirectoryW might still be affected however.
Given that, I'm curious whether Microsoft plans to (eventually) fix the issue on their end though. If that's their plan it may be better to stick to the conventional Win32 APIs and wait for them to fix it than to create our own unique workarounds.