sharpfilesystem icon indicating copy to clipboard operation
sharpfilesystem copied to clipboard

Atomic file operations: Replace, AtomicMove

Open binki opened this issue 7 years ago • 6 comments

It is a common pattern when updating a file to:

  1. Write out the new version of the file in the same directory with a different name.
  2. Rename this new file over the existing file using an atomic rename()/File.Move()/File.Replace().

This pattern allows the program to handle crashes (hardware power loss, process termination) without data corruption. The data in the destination file is either the previous version or the new version, not partially updated.

Atomic operations are often restricted. For example, it is not possible to atomically rename a file across filesystem boundaries. The programmer can generally only assume that a rename can be made atomic if the source and target files share the same directory.

One characteristic of methods representing atomic operations is that they fail without touching data when it is detected that the operation cannot be made atomic. See the documentation for File.Replace() for some examples. (Note that in .net, File.Move() will fallback to a non-atomic operation if the operation cannot be done atomically but it is generally safe to assume that renaming a file within a directory will always use the atomic codepath). So, in the sharpfilesystem universe, this would mean that no such thing as a StandardAtomicEntityMover should exist ;-). To support filesystems which wrap others such as FileSystemMounter, some sort of system for unwrapping before searching for operation implementations as discussed in #4 would help a lot.

Because sharpfilesystem already has a non-atomic Move(), I think the atomic variant should be named AtomicMove() to give us two new calls: AtomicMove() and AtomicReplace(). This would most closely imitate the existing .net APIs which seems to be what sharpfilesystem has done in the past. This way, the programmer can decide to call Move() when atomicity is less of a concern and AtomicMove() when the programmer wants the call to fail if atomicity can’t be guaranteed.

Providing atomic operations support out of the box and implementing them in at least PhysicalFileSytsem would enable existing programs using the transactional file replacement pattern to adopt sharpfilesystem. And then such programs could be more easily testable without needing to give them real filesystem access (my goal ;-)).

Let me know your thoughts and if I should try rigging up a PR for this. I have already messed around a bit with trying to do this sort of thing as an addon to sharpfilesystem rather than changing sharpfilesystem itself, but it’d be much easier/cleaner with internal support for unwrapping and I haven’t finished my original approach yet.

binki avatar Aug 02 '16 02:08 binki

What about letting the user decide how Move should work? You could only register atomic movers and leave out the non-atomic ones. That way you on the one hand can ensure that all operations are atomic and on the other hand can allow fallbacks. SharpFileSystem can come with a RegisterAtomicMovers() and a RegisterMovers() variants.

Is that a viable option or should it really need to be separated into its own operation?

I do like the idea of separating Replace and Move. At the moment it isn't clear whether it should overwrite a file or not.

FrozenCow avatar Aug 02 '16 19:08 FrozenCow

In reply to a comment on #4

I’m actually more worried about atomic operations than performance myself.

What is your specific application?

What I’m currently interested in doing is taking existing code written again .net’s System.IO.Directory and System.IO.File and pointing it at a VFS to make it more reusable and testable. That way I can test that the code interacts correctly with the FS layer without having to set up a physical directory structure and add support for new ways of accessing directories and files by just composing different IFileSystems rather than having to complicate the existing code with backing-specific code.

My existing code uses File.Move() and File.Replace() with the assumptions that they’ll behave atomically when renaming files within a directory because it is accessing a directory which other processes might be trying to update at the same time. I want to be able to avoid the situation where I get a fallback to something nonatomic because I don’t want to, e.g., get stuck in the middle of a move and be unable to delete the original filename even though I was able to create the new one.

You could only register atomic movers and leave out the non-atomic ones.

I would prefer to keep them separate so that when I call the API I can be clear about whether or not I care about atomicity at that point. If I set up the operations in one area of code and invoke them in another, having them share the same name makes it easier to accidentally take code that is assuming the operations will be atomic and breaking its assumptions by adding non-atomic movers. Or to support the scenario where you want to set up filesystems in one place but want to be able to perform both atomic and non-atomic (moving files between directories) operations.

I think making it possible for consuming code to explicitly demand particular operations be atomic is worth it and the easiest way would be to have a separate API for the atomic variant.

The way .net does this right now is a bit more wishy washy. If you use File.Move() you can assume it is atomic if you’re moving a file within the same directory. If you were to make sharpfilesystem imitate this behavior, but still use the “fallback to StandardEntityMover” behavior, that’d remove the demand for atomicity further away from the implementation. It would be easy to think that Move() should be atomic for your situation when it still ends up falling back to StandardEntittyMover because, e.g., your backing is wrapped in SealedFileSystem. In that case you’d rather see that the atomic operation can’t complete and then start weighing your option on how to proceed rather than discovering later that you introduced a race condition by accident.

I do like the idea of separating Replace and Move. At the moment it isn't clear whether it should overwrite a file or not.

I’m not sure I understand what you’re saying here. System.IO.File.Move() will always fail if the destination file exists and System.IO.File.Replace() will always fail if the destination file does not exist. Thus it is quite clear which one will and won’t overwrite a file. If you’d really rather emulate POSIX on just this one point, you could provide Rename() patterned after rename() instead as an operation which is

  1. always atomic
  2. willing to overwrite the destination without requiring the destination to exist.

Just this will deviate from the existing .net System.IO.File APIs programs are already written against and make it harder to port existing code from System.IO.File to IFileSystem ;-). API congruence and and allowing calling code to explicitly demand atomicity is why I’m suggesting AtomicMove() and AtomicReplace().

binki avatar Aug 03 '16 12:08 binki

That sounds like a great use-case for sharpfilesystem. MemoryFileSystem will help out a lot of people for testing.

You're right about Replace. I haven't noticed that one. No need to emulate POSIX. However, Move and Replace aren't always atomic I presume?

Using separate operations for the atomic counterparts of Move and Replace sounds like a good idea. However, I think it might be worth introducing IEntityBinaryOperation (similar to IEntityCopier and IEntityMover) as an abstraction for Move, Replace, Copy, AtomicMove, AtomicReplace, etc. It should include the Context with the call:

    public interface IEntityBinaryOperation
    {
        void Execute(IBinaryOperationContext context, IFileSystem source, FileSystemPath sourcePath, IFileSystem destination, FileSystemPath destinationPath);
    }

I'm not 100% sure the above should be the signature, but Move, Copy, etc will need the context. Otherwise it will not be possible to pass through calls from any of the wrapping filesystems. I imagine that most of the operations will behave exactly the same for these wrapping filesystems. Copy-operations will be passed to children-filesystems exactly the same way as AtomicCopy and Move operations. That's why I think it might be worth looking into some kind of abstraction for the different operations.

The downside is that you can mix different types of operations. For example placing a Copy operation in a Move context. That would be bad, but at the same time it is possible to place a hypothetical FileSystemMounterPassThroughOperation into all 4 (or more) operation-contexts.

To circumvent mixing of different types it might be 'nice' to use generics:

    public interface IEntityBinaryOperation<TOperation> where TOperation : IFileSystemOperation
    {
        void Execute(IBinaryOperationContext<TOperation> context, IFileSystem source, FileSystemPath sourcePath, IFileSystem destination, FileSystemPath destinationPath);
    }

and have a operation-type be defined as:

    public interface IFileSystemOperation { }
    public final class MoveOperation: IFileSystemOperation { private MoveOperation() {} }

TOperation can be MoveOperation. It's just to force the typesystem to separate MoveOperations from CopyOperations. This way PhysicalFileSystem operations will be defined as:

public class PhysicalFileSystemCopyOperation: IEntityBinaryOperation<Copy> { ... }

and FileSystemMounterPassThroughOperation can be defined as:

public class FileSystemMounterPassThroughOperation<TOperation>: IEntityBinaryOperation<TOperation> { ... }

It might be a bit overkill though, let me know what you think.

EDIT: The context will look like:

class FileSystemContext: IBinaryOperationContext<MoveOperation>
{
    IBinaryOperation<MoveOperation> IBinaryOperationContext<MoveOperation>.GetOperation(IFileSystem sourceFileSystem, IFileSystem destinationFileSystem) { ... }
}

FrozenCow avatar Aug 04 '16 08:08 FrozenCow

However, Move and Replace aren't always atomic I presume?

Replace() is always atomic, at least on Windows. And as long as you leave the backup parameter null it very likely is atomic on all platforms. (For mono, ReplaceFile() calls _wapi_rename() which delegates to rename(). It looks like mono is not atomic if you use the backup parameter because it renames the target to backup instead of hardlinking it. That is just a bug in mono, though, and only if you use backup.). The types of exceptions that the documentation documents are those hinting that it is atomic:

PlatformNotSupportedException: The operating system is Windows 98 Second Edition or earlier and the files system is not NTFS. If the sourceFileName and destinationFileName are on different volumes, this method will raise an exception.

Not being supported on Windows 98 means the underlying atomic ReplaceFile() API being hinted at requires a version of Windows with that API. And even though those docs do not claim the operation is atomic, this SO answer’s discussion points to Windows documentation that declares ReplaceFile() to be atomic. Thus, with Replace(), we just directly invoke File.Replace() and get an atomic operation.

File.Move() is not necessarily atomic, but it happens to be atomic if you ensure that source and destination are in the same folder. Thus, if I were to implement a PhysicalAtomicMover, my first implementation would require the source and destination to be in the same folder because that is really the baseline assumption one can make. Unfortunately, this would not support the situation where an application manages a directory (e.g., cache directory with subfolders) and assume that the whole subtree is on the same volume (I saw assume because nothing prevents a user from mounting a volume in a subfolder of the program’s cache directory). However, in .net, I don’t think there is a way without P/Invoke to determine if two directories are on the same volume or not. So with PhysicalAtomicMover, it would be impossible to safely make that more permissive. However, if application developers are willing to assume that moves within particular subtrees are atomic, they may define their own IAtomicMover (err, IBinaryOperation registered with AtomicMovers).

If programmers need to, they can come up with approaches matching the APIs. For example, if you really want to move a file across volumes and have atomicity at the target end, you might use the non-atomic Move() to move the file from the first volume to the target directory with a temporary name and then AtomicReplace() to overwrite the final destination file. People coding against the .net API who care about atomicity should already be treating File.Move() as something which will only be reliable for renames within a directory, so having AtomicMove() limit itself to this situation should be sufficient IMO.

I think it might be worth introducing IEntityBinaryOperation (similar to IEntityCopier and IEntityMover) as an abstraction for Move, Replace, Copy, AtomicMove, AtomicReplace, etc.

I don’t have time this minute to review this all, but this seems to make sense especially as part of the infrastructure to support unwrapping filesystems. When I get time I’ll play around with this in a branch.

binki avatar Aug 04 '16 16:08 binki

And I opened a mono bug #43062 regarding its non-atomicity when using the backup parameter.

binki avatar Aug 04 '16 17:08 binki

Nice. I didn't see a mention of atomic properties in https://msdn.microsoft.com/en-us/library/system.io.file.replace(v=vs.110).aspx, that's why I asked. If we have an operation that documents as being atomic, than that's what it should do. Just be careful of MergedFileSystem ;) The usages will get more limited, but it's still good to have the option.

IEntityBinaryOperation could btw (maybe) also be an Action<...>. Not sure if that's a good idea in this particular instance.

FrozenCow avatar Aug 04 '16 20:08 FrozenCow