helix
helix copied to clipboard
Workaround for Large File Corruption During Save
PR as a workaround for #3967
I could also imagine
tokio::fs::rename(&tmp_path, path).await?;
being used instead of a copy and remove
https://docs.rs/tokio/latest/tokio/fs/fn.rename.html
Using rename is probably better since copy will lead to the same issue. I think it would also be better to create the temp file in the cache directory.
I went the route of making the cached file match the filename instead of having a file called "cachefile" or something just in case multiple saves and crashes occur at the same time
I don't get why the builds are failing. The file saves to exactly where it's supposed to. Could it potentially be an async thing, where it's checking too quickly and the file hasn't been copied to the new location yet?
I'm not sure. The async operation should be completed successfully before it's checked. @dead10ck
Since it's writing to the cache folder, maybe the CI doesn't allow that?
Since it's writing to the cache folder, maybe the CI doesn't allow that?
Maybe? But I would assume it isn't looking there, it should be looking for the file that got created in the working directory? In which case, it saves to the temp file, then moves it to the correct location, then the assert should look there, yeah?
oh you mean that it could be failing to write to the cache folder, therefore it never gets moved into the correct spot afterwards... That makes sense to me
If it wasn't able to write to the cache directory, then the write operation would fail. I'm assuming that's why the file is empty.
If it wasn't able to write to the cache directory, then the write operation would fail. I'm assuming that's why the file is empty.
wait, no, because I ran the integration tests locally just now and they still fail, so it can't be that
Could you test with reverting the rename() change?
Could you test with reverting the
rename()change?
yeah that made it pass
super weird
I'm going to upload this version for now. Maybe dead10ck has some idea of what's going on, I sure don't :sweat_smile:
This will not work if the new name is on a different mount point.
Interesting. That's probably what it is. The test is making a temporary file in /tmp, which is often on a different mount point than /home. That's an odd and unfortunate limitation. Copying doubles the amount of IO for each file save.
Edit: nope, spoke too soon. Could you change the log level to debug here?
hmmmm, the integration test is still failing. Works when I run them locally though, even with the copy/delete
need to sign off for the night. I'll check back throughout the weekend though and keep poking around with you guys :+1:
Oh I think I know what's going on: the cache directory isn't being created, so the write is probably failing because the destination directory doesn't exist. The cache directory is only created in the main function
https://github.com/helix-editor/helix/blob/a3ed9169df8d0a56ab4f698d2c90e7e6b4404633/helix-term/src/main.rs#L41-L47
but the integration tests don't run that code. They build an Application and inject right into the main event loop. So we'll either have to have the integration testing harness or the write path create the directories.
Oooh, that would track for sure, since the change I added replace was the same time we moved it to the cache directory
As for the rename() IO issue, I wonder if we could use that by default, and if it fails try to do the copy approach.
As for the
rename()IO issue, I wonder if we could use that by default, and if it fails try to do the copy approach.
That would need to be directly in the integration test though, right? Since it actually does work the way it's supposed to in practice?
If it's failing because tmp is on a different mount point, then I think it would be better to have it be in save_impl.
Build failed again. It's strange, because it succeeds on my machine. It's got to have something to do with the cache directory location, because when I was caching the file in the working directory, all the builds succeeded in the CI environment
Is there maybe a way to test if the cache directory is on a different mount point than the working directory? It occurs to me that this would need to be addressed regardless
That way we only do the copy in the scenario where rename wouldn't work
The problem with copying is it doesn't actually solve the problem this PR is trying to address, i.e. it could fail in the middle of copying over the target file. Renaming is usually atomic, so you can be sure that if the file wrote to a temporary file successfully that it will either replace the target file whole and complete, or it will fail to rename for some reason and leave the original untouched.
It seems like we have to use rename to get what we want, but in order to do that, we have to make sure the temporary file is on the same mount point. We might be able to find a way to check if it's on the same mount point, but probably not in a trivially cross platform way. The simplest solution might be to just not use the cache directory after all. Writing the temporary file to the same directory as the target file is a really easy way to make sure the mount point is the same.
How's this, I'll revert back to the working directory version of the cache file, and we could open a harder issue for finding a clever way to get it in the cache?
The working directory might not be where you're writing the file though, the path of the file could be on a different mount point. I think we have to parse out the parent directory of the document path we're trying to write to.
Oh, yeah. Sorry, I meant saving the temp file wherever the original file is located
Can we verify how this is done in other editors? I imagine this is very space inefficient for large files during the save process
Far as I can tell from a quick search, vim uses swap files with metadata about the session, like undo/redo and what changed were made. It autosaves every so often to that file for recovery during a crash.
Not sure about how the saving process goes to actually prevent the issue in the first place though, but we could possibly look into changing lines in-place or something?