writeFileAtomic retry strategy could be improved with threadDelay
https://github.com/haskell/cabal/blob/c2d835310c0e8c0b1008c8970e2f5f085c38078c/Cabal-syntax/src/Distribution/Utils/Generic.hs#L184-L200
So if renameFile failed we immediately try copyFile followed by removeFile. My experience however is that this is not helpful, at least on Windows. The most common reason for renameFile to fail is because another process (typically, an antivirus) opened the source file for reading. If this is the case then copyFile is likely to be too short delay for removeFile to succeed, because the source file remains locked by antivirus.
To make it worse, copyFile from directory internally makes its own attempt at atomicity, creating a temporary file and renaming it, which triggers the antivirus yet again.
In short, if renameFile fails then there is no reason to expect copyFile / removeFile to succeed.
I'd suggest to scrap copyFile / removeFile catch: just keep retrying renameFile after threadDelay 10000 a few times more.
CC @GulinSS who was looking at writeFileAtomic recently.
Here I suspect that the issue with c_openat is due of fact we do not use system-defined O_*. So, they are just hardcoded.
When O_CREAT check happened at ghc js-backend side, it does not recognize O_CREAT and fails when Cabal's Setup.hs compiled to js is running.
Looking around for a solution. Probably unix package should be patched to use Base.o_CREAT vs (#const O_CREAT) because o_CREAT can have a different value at stage1. Not system's one.
Do we have a confirmation it's a problem in unix? Anything in cabal that should be fixed/improved, too?
Unix package provides primitives which are used by cabal. Cabal makes assumptions about "atomicity" which really have no strong reasons for file movements. So, if cabal wants to continue using mv pattern with cp+rm it should take into consideration that other platforms like Windows have special cases which @Bodigrim said at the original message.
So , shortly say, yes, mv with cp+rm is cabal stuff not unix, and cabal is a right place where probably need take into account other platforms like Windows where additional delays might be useful.
Will we be happy with an atomic replacement of the file contents?
In Windows, the ReplaceFile operation is atomic on move.
I'm not well versed in this area. My impression is that:
- A "low-levelish" (= without retries and fallbacks)
atomicWriteFileshould live infile-io, see https://github.com/haskell/file-io/issues/35 and issues linked. cabal-installshould build upon it, adding a small delay (~1 ms) and a few retries. It's very annoying when Cabal process on a remote build server aborts an hour-long compilation because of an intermittent issue with renaming.
A "low-levelish" (= without retries and fallbacks) atomicWriteFile should live in file-io
Yeah, it makes lot of sense
I'm not well versed in this area. My impression is that:
- A "low-levelish" (= without retries and fallbacks)
atomicWriteFileshould live infile-io, see Add atomicWriteFile file-io#35 and issues linked.cabal-installshould build upon it, adding a small delay (~1 ms) and a few retries. It's very annoying when Cabal process on a remote build server aborts an hour-long compilation because of an intermittent issue with renaming.
I like this idea, but I also have a question, why does file-io duplicate directory?
why does
file-ioduplicatedirectory?
I'm not quite sure what specifically you are refering to, but the general picture is that boot libraries have a tension who depends on whom. Current situation is that it's directory who depends on file-io, so if file-io needs anything implemented in directory it has to copy relevant bits and pieces. An obvious solution would be to factor out the common part into yet another package, but proliferating the number of boot libraries is costly, so it's more pragmatic to stick to status quo and duplicate things.
To be clear, I'm not a maintainer of any of the libraries mentioned.
The reason renameFile might fail and copy file succeed is if the targetPath and tmpDir are on different file systems. But the solution to that is to create the temp file in the directory of the target path. That's guaranteed to the the same file system.
Although that is guaranteed to succeed, the path can get too long, as it does with backpack. This PR is probably related to this thread https://github.com/haskell/cabal/pull/10366