micro icon indicating copy to clipboard operation
micro copied to clipboard

Micro saves files non-atomically

Open dmaluka opened this issue 1 year ago • 16 comments

When saving a file, micro simply overwrites it with the new contents. So if writing is interrupted in the middle due to some error (e.g. running out of disk space, unplugging USB stick and so on), the file is left in a damaged state: partially written or just empty (i.e. the user's data is lost).

To prevent this, micro should save files atomically: first write to a temporary file, and then, if write completes successfully, replace the user's file with this temporary file. AFAIK this is how it is done by other editors.

This issue concerns not just the edited files themselves, but also other files that micro may modify: settings.json, bindings.json, backups and so on.

Commit hash: 59dda01c OS: any Terminal: any

dmaluka avatar Feb 20 '24 22:02 dmaluka

This issue concerns not just the edited files themselves, but also other files that micro may modify: settings.json, bindings.json, backups and so on.

This actually happened to me today in real life: I started fetching a huge repo in the background, and went on using micro. The fetch eventually ate up all the disk space, and when I started micro yet another time, it showed me Error reading settings.json: unexpected end of JSON input error, since my settings.json was empty; and the command history was empty too. Luckily the actual files I was working with using micro weren't damaged, since I wasn't really modifying them.

dmaluka avatar Feb 20 '24 22:02 dmaluka

I see other people have also already encountered this problem: https://github.com/zyedidia/micro/issues/1916

dmaluka avatar Feb 21 '24 17:02 dmaluka

So like in the most of the editors overwriteFile() should write to name + e.g. ".tmp" and then use os.Rename() to move?

JoeKar avatar Feb 21 '24 18:02 JoeKar

More or less. We might need to figure out how to do it fully reliably (or as reliably as possible), e.g. what if the file is a symlink or a bind mount, and so on. And what about Windows (I heard it's trickier to ensure atomicity there).

dmaluka avatar Feb 22 '24 03:02 dmaluka

Had already an look into the os interfaces, but unfortunately it doesn't provide that capability (like copy-on-write and move afterwards) with that abstraction already. And yes, Windows could become tricky as well as the integration into the present su-approach. So from current perspective it looks like reinventing the wheel again. :(

JoeKar avatar Feb 22 '24 17:02 JoeKar

There are some 3rd-party packages e.g. renameio but I haven't looked into that deeply.

dmaluka avatar Feb 22 '24 19:02 dmaluka

There are some 3rd-party packages e.g. renameio but I haven't looked into that deeply.

A naive approach to the problem is to create a temporary file followed by a call to os.Rename().

...and so I was blamed again. :smile: But it sounds worth to it add it as a new dependency, because we then don't need to take care of these corner cases, because someone already did and implemented it for general purpose.

As it is not possible to provide a correct implementation, this package does not export any functions on Windows.

On Windows we can still run into trouble. Due to writefile.go#L15-L16 it looks like we can use this interface under POSIX only.

JoeKar avatar Feb 23 '24 07:02 JoeKar

Seems like any solution (even the naïve solution with os.rename) would be better than the current situation where a corruption is guaranteed to occur.

X-Ryl669 avatar Mar 14 '24 09:03 X-Ryl669

Yes and no. If for example the user's file is a symlink, and we just rename() it with our temporary file, the user's file silently becomes a regular file instead of a symlink. Which is arguably even worse than the current situation, since it will happen always, not just when an I/O failure occurs.

dmaluka avatar Mar 14 '24 10:03 dmaluka

Honestly, I don't know if it's a Go limitation or not. On Posix, you're safe to use renameat that will do exactly what you expect since you provide opened file descriptor for the rename operation, thus you've already followed the link to get the file descriptor.

If it's not available everywhere, I guess a readlink is to ensure the target of the link is selected for the new parameter of rename (but it suffers from TOCTOU issues as usual)

X-Ryl669 avatar Mar 14 '24 11:03 X-Ryl669

Also, I don't agree with the idea that leaving a temporary file upon failure is bad. I think, on the opposite that it's a welcome feature. If I edit a file other a SSH session and the network suddenly goes down or a power outage, I'd be a lot more happy to find a .tmp file with the current edited buffer than nothing (like if you used O_TMPFILE).

X-Ryl669 avatar Mar 14 '24 11:03 X-Ryl669

Also, I don't agree with the idea that leaving a temporary file upon failure is bad.

Sure, we should keep the temporary file upon a failure, since it contains user data which would be lost otherwise. But I was referring to a different case. I was referring to a case when there is no failure, and as a result, the user's symlink is silently overwritten with a regular file (although the contents of this file are correct). So the user now suddenly has two version of the file (with the old contents and with the new contents) instead of two references to the same file (with the new contents), and has no idea about it.

On Posix, you're safe to use renameat that will do exactly what you expect since you provide opened file descriptor for the rename operation, thus you've already followed the link to get the file descriptor.

I don't think this is true. The descriptor renameat() is provided with is the descriptor of the directory where the file is, not the descriptor of the file itself.

For what it's worth:

~ % echo aaa >test1
~ % ln -s test1 test2
~ % ls -l test1 test2
-rw-r--r-- 1 mitya mitya 4 Mar 14 12:22 test1
lrwxrwxrwx 1 mitya mitya 5 Mar 14 12:22 test2 -> test1
~ % cat test2
aaa
~ % echo bbb >test3
~ % mv test3 test2
~ % ls -l test1 test2
-rw-r--r-- 1 mitya mitya 4 Mar 14 12:22 test1
-rw-r--r-- 1 mitya mitya 4 Mar 14 12:23 test2
~ % cat test2
bbb
~ % cat test1
aaa

and the strace of mv (the gist of it):

renameat2(AT_FDCWD, "test3", AT_FDCWD, "test2", RENAME_NOREPLACE) = -1 EEXIST (File exists)
stat("test2", {st_mode=S_IFREG|0644, st_size=4, ...}) = 0
newfstatat(AT_FDCWD, "test3", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "test2", {st_mode=S_IFLNK|0777, st_size=5, ...}, AT_SYMLINK_NOFOLLOW) = 0
rename("test3", "test2")                = 0

dmaluka avatar Mar 14 '24 11:03 dmaluka

BTW it should be worth looking how Vim handles the issue when saving files (and analyzing whether what Vim is doing is reasonable). At first glance it looks like it reuses the backup file for that (or rather, what vim calls the "backup file" is only used when saving the file): vim first tries to overwrite the target file with the contents of the backup file, and if that fails, it resorts to rename()ing the backup file to the target file.

dmaluka avatar Mar 14 '24 11:03 dmaluka

You're right. I've confused linkat with renameat. You have a AT_FOLLOW_LINKS or something like this but it's following the old symlink not the new one. There isn't any function to do that atomically on linux.

So you're back to the basic option of doing a readlink on the destination (if it's a link) and update the path accordingly.

That's what vim is doing

X-Ryl669 avatar Mar 14 '24 14:03 X-Ryl669

As this issue is listed as open, I'd like to offer a possibly relevant input--re inode numbers.

On POSIX anyway, sometimes it's worthwhile to preserve the inode number, and thus any outstanding hard links, which does not happen with the (write_new && rename) method. Also affects inotify for instance.

This is one of the reasons why I was a happy vim user for so many years--knowing that vim wouldn't silently break things that depend on this, which I had been burnt by enough times to form a personal opinion on the subject.

Currently seems like micro preserves the inode number when saving, which I like. I would even go so far as to suggest that there should be a unit test such that this invariant is preserved where possible with the default config.

jmcorey avatar Aug 07 '24 19:08 jmcorey

Yep, in the PR #3273 (not finished) we have already abandoned the idea to use rename (see this discussion: https://github.com/zyedidia/micro/pull/3273#discussion_r1599033321), and instead decided to always use a non-atomic method: first write the buffer to a temporary (backup) file, and then write the same buffer to the original file, - so that we cannot guarantee that we won't damage the original file, but we can guarantee that if we damage it, we have a backup to recover the original file. That is less convenient to the user (in case a damage actually happens), but it always preserves the original inode, with all the advantages that come with that.

dmaluka avatar Aug 07 '24 20:08 dmaluka