fast-downward icon indicating copy to clipboard operation
fast-downward copied to clipboard

Prevent planner from locking up on large problems.

Open luke-clifton opened this issue 4 years ago • 7 comments

In some scenarios, the downward executable was producing a lot of output which was not being read by the library. This would eventually cause the downward process to hang, waiting for someone to consume its output. This change drains both output handles before calling wait, and thus preventing the deadlock.

luke-clifton avatar Mar 10 '20 05:03 luke-clifton

Other solution would be to not pipe the outputs seeing as they aren't being used.

luke-clifton avatar Mar 10 '20 05:03 luke-clifton

This is interesting, I'm not sure I've observed this at all in years of running this in production. Do you have any way to reproduce this?

ocharles avatar Mar 10 '20 09:03 ocharles

Set the list length as required to make downward spit out more than whatever your OS is allowing it to buffer. This patch will make it work.

import FastDownward                                           
import FastDownward.Exec as Exec                              
                                                              
data Add = Add Int                                            
                                                              
main :: IO ()                                                 
main = do                                                     
    let                                                       
        initial :: [Int]                                      
        initial = [0..500]                                    
    res <- runProblem $ do                                    
        vars <- mapM newVar initial                           
        solve Exec.bjolp                                      
            (map (\i -> modifyVar (vars !! i) negate) initial)
            (zipWith (?=) vars (map negate initial))          
    print res                                                                                                          

luke-clifton avatar Mar 10 '20 11:03 luke-clifton

Thanks!

ocharles avatar Mar 10 '20 11:03 ocharles

You can tell it's hung by this (and not because the problem is too large) because your CPU will not be busy, as all processes are sleeping, downward waiting for your app to read from the pipes, and your app is waiting for downward to finish.

luke-clifton avatar Mar 10 '20 11:03 luke-clifton

Did you manage to reproduce? 90 was enough to overflow the buffers on macOS and on my NixOS machine. That is, 90 vars, with 90 effects.

I've stuck the complete example in a gist.

https://gist.github.com/luke-clifton/29df0f6cc0664a3eaa64e6e433506cc5

Technically, one should really be reading from both of those handles at the same time. If the stderr handle fills up, and we are still trying to read the stdout handle, it would block as well, a fact that can be witnessed by swapping the order that we read the inputs in in this patch and seeing that it will lock up again.

To combat that we could possible use Control.Concurrent.Async.concurrently

  (stderr, stdout) <- concurrently
        (Data.Text.IO.hGetContents stderrHandle)
        (Data.Text.IO.hGetContents stdoutHandle)

Which would be totally safe at the cost of adding async as a direct dependency. (It is already a transitive one though).

luke-clifton avatar Mar 11 '20 04:03 luke-clifton

I'm afraid I haven't had a chance to look yet. I hope to look soon.

ocharles avatar Mar 11 '20 10:03 ocharles