helix Workaround for Large File Corruption During Save

PR as a workaround for #3967

Oct 14 '22 20:10 ngraham20

I could also imagine

tokio::fs::rename(&tmp_path, path).await?;

being used instead of a copy and remove

https://docs.rs/tokio/latest/tokio/fs/fn.rename.html

Oct 14 '22 20:10 ngraham20

Using rename is probably better since copy will lead to the same issue. I think it would also be better to create the temp file in the cache directory.

Oct 15 '22 00:10 kirawi

I went the route of making the cached file match the filename instead of having a file called "cachefile" or something just in case multiple saves and crashes occur at the same time

Oct 15 '22 01:10 ngraham20

I don't get why the builds are failing. The file saves to exactly where it's supposed to. Could it potentially be an async thing, where it's checking too quickly and the file hasn't been copied to the new location yet?

Oct 15 '22 02:10 ngraham20

I'm not sure. The async operation should be completed successfully before it's checked. @dead10ck

Oct 15 '22 02:10 kirawi

Since it's writing to the cache folder, maybe the CI doesn't allow that?

Oct 15 '22 02:10 kirawi

Since it's writing to the cache folder, maybe the CI doesn't allow that?

Maybe? But I would assume it isn't looking there, it should be looking for the file that got created in the working directory? In which case, it saves to the temp file, then moves it to the correct location, then the assert should look there, yeah?

Oct 15 '22 02:10 ngraham20

oh you mean that it could be failing to write to the cache folder, therefore it never gets moved into the correct spot afterwards... That makes sense to me

Oct 15 '22 02:10 ngraham20

If it wasn't able to write to the cache directory, then the write operation would fail. I'm assuming that's why the file is empty.

Oct 15 '22 02:10 kirawi

If it wasn't able to write to the cache directory, then the write operation would fail. I'm assuming that's why the file is empty.

wait, no, because I ran the integration tests locally just now and they still fail, so it can't be that

Oct 15 '22 02:10 ngraham20

Could you test with reverting the rename() change?

Oct 15 '22 02:10 kirawi

Could you test with reverting the rename() change?

yeah that made it pass

super weird

Oct 15 '22 02:10 ngraham20

I'm going to upload this version for now. Maybe dead10ck has some idea of what's going on, I sure don't :sweat_smile:

Oct 15 '22 02:10 ngraham20

This will not work if the new name is on a different mount point.

Interesting. That's probably what it is. The test is making a temporary file in /tmp, which is often on a different mount point than /home. That's an odd and unfortunate limitation. Copying doubles the amount of IO for each file save.

Edit: nope, spoke too soon. Could you change the log level to debug here?

Oct 15 '22 02:10 dead10ck

hmmmm, the integration test is still failing. Works when I run them locally though, even with the copy/delete

Oct 15 '22 03:10 ngraham20

need to sign off for the night. I'll check back throughout the weekend though and keep poking around with you guys :+1:

Oct 15 '22 03:10 ngraham20

Oh I think I know what's going on: the cache directory isn't being created, so the write is probably failing because the destination directory doesn't exist. The cache directory is only created in the main function

https://github.com/helix-editor/helix/blob/a3ed9169df8d0a56ab4f698d2c90e7e6b4404633/helix-term/src/main.rs#L41-L47

but the integration tests don't run that code. They build an Application and inject right into the main event loop. So we'll either have to have the integration testing harness or the write path create the directories.

Oct 15 '22 03:10 dead10ck

Oooh, that would track for sure, since the change I added replace was the same time we moved it to the cache directory

Oct 15 '22 03:10 ngraham20

As for the rename() IO issue, I wonder if we could use that by default, and if it fails try to do the copy approach.

Oct 15 '22 04:10 kirawi

As for the rename() IO issue, I wonder if we could use that by default, and if it fails try to do the copy approach.

That would need to be directly in the integration test though, right? Since it actually does work the way it's supposed to in practice?

Oct 15 '22 04:10 ngraham20

If it's failing because tmp is on a different mount point, then I think it would be better to have it be in save_impl.

Oct 15 '22 04:10 kirawi

Build failed again. It's strange, because it succeeds on my machine. It's got to have something to do with the cache directory location, because when I was caching the file in the working directory, all the builds succeeded in the CI environment

Oct 15 '22 04:10 ngraham20

Is there maybe a way to test if the cache directory is on a different mount point than the working directory? It occurs to me that this would need to be addressed regardless

Oct 15 '22 04:10 ngraham20

That way we only do the copy in the scenario where rename wouldn't work

Oct 15 '22 04:10 ngraham20

The problem with copying is it doesn't actually solve the problem this PR is trying to address, i.e. it could fail in the middle of copying over the target file. Renaming is usually atomic, so you can be sure that if the file wrote to a temporary file successfully that it will either replace the target file whole and complete, or it will fail to rename for some reason and leave the original untouched.

It seems like we have to use rename to get what we want, but in order to do that, we have to make sure the temporary file is on the same mount point. We might be able to find a way to check if it's on the same mount point, but probably not in a trivially cross platform way. The simplest solution might be to just not use the cache directory after all. Writing the temporary file to the same directory as the target file is a really easy way to make sure the mount point is the same.

Oct 15 '22 04:10 dead10ck

How's this, I'll revert back to the working directory version of the cache file, and we could open a harder issue for finding a clever way to get it in the cache?

Oct 15 '22 04:10 ngraham20

The working directory might not be where you're writing the file though, the path of the file could be on a different mount point. I think we have to parse out the parent directory of the document path we're trying to write to.

Oct 15 '22 04:10 dead10ck

Oh, yeah. Sorry, I meant saving the temp file wherever the original file is located

Oct 15 '22 05:10 ngraham20

Can we verify how this is done in other editors? I imagine this is very space inefficient for large files during the save process

Oct 15 '22 06:10 archseer

Far as I can tell from a quick search, vim uses swap files with metadata about the session, like undo/redo and what changed were made. It autosaves every so often to that file for recovery during a crash.

Not sure about how the saving process goes to actually prevent the issue in the first place though, but we could possibly look into changing lines in-place or something?

Oct 15 '22 12:10 ngraham20

helix helix copied to clipboard

Workaround for Large File Corruption During Save

helix
helix copied to clipboard