process
process copied to clipboard
Pipe blocking when input too large? (Mac OS)
In simulating (cat | less
) using createProcess
and CreatePipe
, I am experiencing a weird threshold on the input to cat
. This is shrunk from a real world problem.
import System.IO
import System.Process
main = do
(Just inp, Just out, _, ph1) <- createProcess $
(proc "cat" [])
{ std_in = CreatePipe
, std_out = CreatePipe
}
-- Freezes when 'good' is replaced with 'bad' (1 line `a 64 bytes more)
hPutStr inp good
hClose inp
(_, _, _, ph2) <- createProcess $
(proc "less" [])
{ std_in = UseHandle out }
waitForProcess ph1
waitForProcess ph2
putStrLn "Program terminated successfully."
where
good = unlines $ replicate 2304 $ replicate 63 'A'
bad = unlines $ replicate 2305 $ replicate 63 'A'
Using the good
input, less
shows up presenting me the 2304 lines of 63 A
s each.
Using the bad
input, nothing shows up, and I can only Ctrl-C.
This is on Mac OS Mojave with GHC 9.0.1 and latest process
(1.6.13.2).
In my original setting, the pipe was nroff -man /dev/stdin | less
and the threshold was exactly 192kb (192 * 1024 bytes).
Note that there is no problem if I let the OS do the piping (using shell "cat | less"
):
main = do
(Just inp, _, _, p) <- createProcess $
(shell "cat | less")
{ std_in = CreatePipe }
hPutStr inp bad
hClose inp
waitForProcess p
putStrLn "Program terminated successfully."
The library is behaving correctly in this case. hPutStr
is filling up an OS buffer, and blocking when it's full. If the amount of data is small enough, the blocking never occurs and the second process can start. Once the data is large enough to fill the buffer, we block before less
can be spawned and cannot drain the pipe.
When you use shell
, the shell itself handles the asynchronous aspect of things.
Thanks for the explanation! Makes perfect sense now.
Consequently, here is the correct implementation of the pipe, which indeed works as expected:
main = do
(Just inp, Just out, _, ph1) <- createProcess $
(proc "cat" [])
{ std_in = CreatePipe
, std_out = CreatePipe
}
(_, _, _, ph2) <- createProcess $
(proc "less" [])
{ std_in = UseHandle out }
-- After the pipe is set up, we can pour the input without blocking:
hPutStr inp longInput
hClose inp
waitForProcess ph1
waitForProcess ph2
putStrLn "Program terminated successfully."
where
longInput = unlines $ replicate 10000 $ replicate 63 'A'
Alternatively, writing to inp
can be done concurrently to not block the creation of the pipe:
import Control.Concurrent.Async
import System.IO
import System.Process
main = do
(Just inp, Just out, _, ph1) <- createProcess $
(proc "cat" [])
{ std_in = CreatePipe
, std_out = CreatePipe
}
withAsync (hPutStr inp longInput >> hClose inp) $ \_ -> do
(_, _, _, ph2) <- createProcess $
(proc "less" [])
{ std_in = UseHandle out }
waitForProcess ph1
waitForProcess ph2
putStrLn "Program terminated successfully."
where
longInput = unlines $ replicate 10000 $ replicate 63 'A'
I wonder whether this example could be added as tutorial somewhere to the documentation (I can do this if it is welcome). Previously, I had googled a lot but did not find enough examples for CreatePipe
.
I’m certainly up for such a doc addition. At the very least, a “make sure you don’t deadlock by filing buffers” with a link to this issue would be great.