hGetContents closes handle
hGetContents doesn't create Handle and therefore it is not obvious why it closes it. Handle is closed on a bracket.
data GitObject
= GitObject
{ gobHash :: !(Digest SHA1State)
, gobOrigin :: !FilePath
}
deriving (Show, Eq)
mkGitObject :: PhoenixM m => FilePath -> m (Maybe GitObject)
mkGitObject fp = do
s <- asks inHandlesSem
U.bracket_ (U.waitQSem s) (U.signalQSem s) $
U.withBinaryFile fp U.ReadMode $ \inH -> do
magicBs <- hGet inH 2
if zlibP magicBs
then do
!headerBs <- (magicBs <>) <$> hGet inH 10
(`U.catch` skipCorruptedFile inH) $ do
if gitObjectP $ Z.decompress (toLazy headerBs)
then do
!goh <- sha1 . Z.decompress . (toLazy headerBs <>) <$> hGetContents inH
pure . Just $ GitObject goh fp
else pure Nothing
else pure Nothing
where
skipCorruptedFile inH (e :: Z.DecompressError) =
case e of
Z.TruncatedInput -> do
fsz <- liftIO $ U.hFileSize inH
Data.ByteString.Lazy.hGetContents implements a lazy IO and as such is not suitable to be used inside bracket. For instance,
ghci> import System.IO
ghci> import Data.ByteString.Lazy qualified as BL
ghci> withBinaryFile "foo.txt" ReadMode BL.hGetContents >>= print
"*** Exception: foo.txt: hGetBufSome: illegal operation (handle is closed)
That's because the moment BL.hGetContents runs unsafeInterleaveIO it passes control to withBinaryFile which readily closes the handle. When later on the result of Bl.hGetContents is being forced, BL.hGetContents tries to read from the closed handle and fails.
The expected way to use BL.hGetContents is to pass it a handle and let it close it on its own accord. This is the documented behaviour:
https://github.com/haskell/bytestring/blob/10099d9a412ce57ce72b6ac72039175b690193d8/Data/ByteString/Lazy.hs#L1602-L1603
A variant of BL.hGetContents as proposed in #708 which does not close the handle seems to me even more dangerous than the original: it still cannot be used under bracket because lazy IO will escape again for the same reason of unsafeInterleaveIO, but also now a developer is responsible to put hClose manually exactly at the spot where ByteString has been consumed in full. While helpful in marginal cases as presented above, when a lazy ByteString is consumed strictly and immediately, in general it would present a user with a nuanced choice between two perilous functions and as such be a net negative.
All in all lazy IO struggles when precise resource management is needed. If the latter is a requirement, I'd use Data.Conduit.Zlib, Data.Streaming.Zlib, Codec.Compression.Zlib.Internal or similar. As a maintainer of zlib I'm open to add a high level helper, something like
decompressAndReduceFromHandle :: Handle -> (acc -> S.ByteString -> IO acc) -> acc -> IO (acc, S.ByteString)
(where the second component of return stands for trailer after the end of archive).
https://github.com/yaitskov/lazy-scope