hackage-security
hackage-security copied to clipboard
A couple spots async exceptions aren't handled gracefully
Based on https://github.com/commercialhaskell/stack/issues/3073 , I suspected hackage-security wasn't handling async exceptions quite right. Indeed, a search for SomeException led me to find the following spots where SomeException gets caught and not rethrown:
Relevant to the linked issue:
https://github.com/haskell/hackage-security/blob/226e1e43ba1b9ac34cab6b1564604b86b50e9129/hackage-security/src/Hackage/Security/Client/Repository/Remote.hs#L278
Part of hackage-repo-tool:
https://github.com/haskell/hackage-security/blob/226e1e43ba1b9ac34cab6b1564604b86b50e9129/hackage-repo-tool/src/Hackage/Security/RepoTool/Util/IO.hs#L84
I also noticed that the UpdateFailed exception constructor is never used - https://github.com/haskell/hackage-security/blob/226e1e43ba1b9ac34cab6b1564604b86b50e9129/hackage-security/src/Hackage/Security/Client/Repository.hs#L336
@mgsloan thanks.
@edsko looking at that use of catchChecked, I don't see how the Throws e => constraint is resolved. There doesn't seem to be any instance around.
@edsko looking at that use of catchChecked, I don't see how the Throws e => constraint is resolved. There doesn't seem to be any instance around.
That's the whole point. https://www.well-typed.com/blog/2015/07/checked-exceptions/ .
fwiw, I'd suggest adding a link to that blogpost to the Hackage.Security.Util.Checked module (which provides catchChecked) as the concept is really a little bit non-obvious to the non-initiated... :-)
It may also be good to have it wait for the lock to become available - https://github.com/commercialhaskell/stack/issues/3055#issuecomment-291018128
Could this be the case why https://github.com/commercialhaskell/stack/issues/3055 happens even with just ctrl-c (sigint/sigterm)? I've seen it happen on CI, where I'd be surprised anything is calling SIGKILL.
Managed to reproduce it with:
$ rm -rf ~/.cabal
$ cabal update
# wait 15s
ctrl-c
$ cabal update
Downloading the latest package list from hackage.haskell.org
/home/ielectric/.cabal/packages/hackage.haskell.org/hackage-security-lock: createDirectory: already exists (File exists)
I think this is because of https://github.com/haskell/hackage-security/blob/71a24d63f776c62a532c6d68e56aca42ece7640e/hackage-security/src/Hackage/Security/Util/IO.hs#L33-L48
So this seems to just create a plain directory to implement a lock. Any form of createDirectory >> removeDirectory as done there cannot survive SIGKILL or computer crashes. I would have expected this to be implemented with flock().
@nh2 unfortunately flock() isn't portable (a problem we're having trouble with already in cabal; but I'm working on something to address that; once I'm done we can use it hackage-security too)
https://hackage.haskell.org/package/filelock-0.1.1.2/docs/System-FileLock.html works well on Windows as long as you don't use lock paths for anything.
@domenkozar I'm aware of the prior art (and there's a lot of history regarding file/record locking in Unix...) :-)
@nh2 unfortunately flock() isn't portable (a problem we're having trouble with already in cabal; but I'm working on something to address that; once I'm done we can use it hackage-security too)
@domenkozar I'm aware of the prior art :-)
So what are you working on to address this?
@domenkozar I'm aware of the prior art :-)
Ah sorry, I assume you don't due to your previous comment. I guess my implicit question was if we can use filelock to solve this properly and what would be the cons (besides obvious extra dependency) to fix this bug?
I've written a commit to demonstrate one failure mode for directory-based file locking:
https://github.com/snoyberg/hackage-security/commit/ffb83f9f61969b8792a797058b04c68ab88cf112
Quoting the commit message:
Due to the nature of directory-based locking, if the process exits before exception handlers are run, the directory is never removed and the system remains in a "locked" state indefinitely. This can happen due to SIGKILL or power failure, as well as a blocked thread when main exits. This repro follows that last approach, forking a thread and then letting main exit while that locked thread delays.
Regarding the larger issue of async exception safety here: one possibility is to implement the same technique used in safe-exceptions in the Hackage.Security.Util.Checked module, namely:
- Differentiate at the type level between sync and async exceptions using the
SomeAsyncExceptiontype - Ensure that all exceptions passed to
throwIOare synchronous - Disallow
catchChecked,handleChecked, andtryCheckedfrom catching any asynchronous exception.
Note that, AFAICT, the async exception safety issue is almost entirely orthogonal to the issue of using directory creation/removal as a form of locking.
I've opened up #202 with the async exception detection logic I described above.
I can also report that, with my patch in #202, Stack no longer has the buggy behavior described in commercialhaskell/stack#3073 (see my comment https://github.com/commercialhaskell/stack/issues/3073#issuecomment-365297649).