hackage-security icon indicating copy to clipboard operation
hackage-security copied to clipboard

A couple spots async exceptions aren't handled gracefully

Open mgsloan opened this issue 8 years ago • 15 comments

Based on https://github.com/commercialhaskell/stack/issues/3073 , I suspected hackage-security wasn't handling async exceptions quite right. Indeed, a search for SomeException led me to find the following spots where SomeException gets caught and not rethrown:

Relevant to the linked issue:

https://github.com/haskell/hackage-security/blob/226e1e43ba1b9ac34cab6b1564604b86b50e9129/hackage-security/src/Hackage/Security/Client/Repository/Remote.hs#L278

Part of hackage-repo-tool:

https://github.com/haskell/hackage-security/blob/226e1e43ba1b9ac34cab6b1564604b86b50e9129/hackage-repo-tool/src/Hackage/Security/RepoTool/Util/IO.hs#L84

I also noticed that the UpdateFailed exception constructor is never used - https://github.com/haskell/hackage-security/blob/226e1e43ba1b9ac34cab6b1564604b86b50e9129/hackage-security/src/Hackage/Security/Client/Repository.hs#L336

mgsloan avatar Mar 19 '17 22:03 mgsloan

@mgsloan thanks.

@edsko looking at that use of catchChecked, I don't see how the Throws e => constraint is resolved. There doesn't seem to be any instance around.

dcoutts avatar Mar 20 '17 09:03 dcoutts

@edsko looking at that use of catchChecked, I don't see how the Throws e => constraint is resolved. There doesn't seem to be any instance around.

That's the whole point. https://www.well-typed.com/blog/2015/07/checked-exceptions/ .

edsko avatar Mar 20 '17 09:03 edsko

fwiw, I'd suggest adding a link to that blogpost to the Hackage.Security.Util.Checked module (which provides catchChecked) as the concept is really a little bit non-obvious to the non-initiated... :-)

hvr avatar Mar 20 '17 09:03 hvr

It may also be good to have it wait for the lock to become available - https://github.com/commercialhaskell/stack/issues/3055#issuecomment-291018128

mgsloan avatar Apr 03 '17 20:04 mgsloan

Could this be the case why https://github.com/commercialhaskell/stack/issues/3055 happens even with just ctrl-c (sigint/sigterm)? I've seen it happen on CI, where I'd be surprised anything is calling SIGKILL.

domenkozar avatar Feb 13 '18 08:02 domenkozar

Managed to reproduce it with:

$ rm -rf ~/.cabal
$ cabal update
# wait 15s
ctrl-c

$ cabal update
Downloading the latest package list from hackage.haskell.org
/home/ielectric/.cabal/packages/hackage.haskell.org/hackage-security-lock: createDirectory: already exists (File exists)

domenkozar avatar Feb 13 '18 09:02 domenkozar

I think this is because of https://github.com/haskell/hackage-security/blob/71a24d63f776c62a532c6d68e56aca42ece7640e/hackage-security/src/Hackage/Security/Util/IO.hs#L33-L48

So this seems to just create a plain directory to implement a lock. Any form of createDirectory >> removeDirectory as done there cannot survive SIGKILL or computer crashes. I would have expected this to be implemented with flock().

nh2 avatar Feb 13 '18 09:02 nh2

@nh2 unfortunately flock() isn't portable (a problem we're having trouble with already in cabal; but I'm working on something to address that; once I'm done we can use it hackage-security too)

hvr avatar Feb 13 '18 09:02 hvr

https://hackage.haskell.org/package/filelock-0.1.1.2/docs/System-FileLock.html works well on Windows as long as you don't use lock paths for anything.

domenkozar avatar Feb 13 '18 09:02 domenkozar

@domenkozar I'm aware of the prior art (and there's a lot of history regarding file/record locking in Unix...) :-)

hvr avatar Feb 13 '18 10:02 hvr

@nh2 unfortunately flock() isn't portable (a problem we're having trouble with already in cabal; but I'm working on something to address that; once I'm done we can use it hackage-security too)

@domenkozar I'm aware of the prior art :-)

So what are you working on to address this?

chrisdone avatar Feb 13 '18 10:02 chrisdone

@domenkozar I'm aware of the prior art :-)

Ah sorry, I assume you don't due to your previous comment. I guess my implicit question was if we can use filelock to solve this properly and what would be the cons (besides obvious extra dependency) to fix this bug?

domenkozar avatar Feb 13 '18 10:02 domenkozar

I've written a commit to demonstrate one failure mode for directory-based file locking:

https://github.com/snoyberg/hackage-security/commit/ffb83f9f61969b8792a797058b04c68ab88cf112

Quoting the commit message:

Due to the nature of directory-based locking, if the process exits before exception handlers are run, the directory is never removed and the system remains in a "locked" state indefinitely. This can happen due to SIGKILL or power failure, as well as a blocked thread when main exits. This repro follows that last approach, forking a thread and then letting main exit while that locked thread delays.

Regarding the larger issue of async exception safety here: one possibility is to implement the same technique used in safe-exceptions in the Hackage.Security.Util.Checked module, namely:

  • Differentiate at the type level between sync and async exceptions using the SomeAsyncException type
  • Ensure that all exceptions passed to throwIO are synchronous
  • Disallow catchChecked, handleChecked, and tryChecked from catching any asynchronous exception.

Note that, AFAICT, the async exception safety issue is almost entirely orthogonal to the issue of using directory creation/removal as a form of locking.

snoyberg avatar Feb 13 '18 14:02 snoyberg

I've opened up #202 with the async exception detection logic I described above.

snoyberg avatar Feb 13 '18 14:02 snoyberg

I can also report that, with my patch in #202, Stack no longer has the buggy behavior described in commercialhaskell/stack#3073 (see my comment https://github.com/commercialhaskell/stack/issues/3073#issuecomment-365297649).

snoyberg avatar Feb 13 '18 15:02 snoyberg