wicked_pdf
wicked_pdf copied to clipboard
WIP: Use POSIX::Spawn::Child instead of #popen3 to avoid deadlocks
Problem
We are seeing No live threads left. Deadlock?
in production a lot. This is not easy to reproduce, and it only happens under load.
After upgrading to Ruby 2.4.2 and Rails 5.1 we started seeing this much more often.
Cause
Incorrect use of #popen3
(whose API is badly designed). Particularly,
Open3.popen3(*command) do |_stdin, _stdout, stderr|
stderr.read
end
This produces a deadlock in case the command filled the whole stdout
buffer. In that case, the subprocess is waiting for that buffer to be consumed, and the parent process is waiting for stderr
's buffer to have content.
Ruby core team's position is that this should be handled with IO.select
(check ref 1).
Solution
First alternative would be to use IO.select
. But that sounds error-prone to me.
I decided to use the posix-spawn
gem (particularly POSIX::Spawn::Child
which handles this correctly). This behavior is optional (but default). This may break Windows compatibility (check Notes)
References
- https://bugs.ruby-lang.org/issues/9082
- nicer explanation: http://coldattic.info/post/63/
Notes
- I'm not sure about
posix-spawn
Windows support so I decided to make this behavior the default, but optional. SettingWICKED_POPEN
env variable toruby
uses the standard library'sOpen3
instead. Gem dependencies still install posix-spawn, so it may still fail on Windows. - I couldn't make Travis run tests successfully yet. Not even
master
as it is today upstream. - This is a WIP. My intention is to bring this to your attention. Any feedback is welcome.
Thank you for this. It sounds great. Have you been running this fork in your own apps and seen an improvement?
For an introductory transition period, I think it's be great if we could add an option to WickedPdf.config
to opt into this, rather than an ENV variable, and making this option the default in a later release.
I just fixed up the test suite. I had a dependency that wasn't pinned to a version update. If you rebase, you should be able to run the tests locally or on Travis-CI.
sorry I missed the notification; I'll do the config
change in a few hours (thanks for the tip, I honestly didn't know);
and dependencies versions… it happens
sorry, I didn't answer: yes, I've been using it in production for ~20 days and no deadlocks (had 1 - 2 per week before)
had 1 - 2 per week before
actually, historically we were seeing 1-2 per month (we process ~1000 PDFs a month), say ~0.2%; after an upgrade (Rails 4.1 -> 5.1, Ruby 2.1 -> 2.4) and a ~20% growth we started seeing ~1-2 per week, say ~0.6% (every once in a while we had weird spikes, which is the reason that I had to do this)
Just curious why not use Open3.capture3
as that ruby-lang ticket suggested?
@allenwu1973 there are other advantages using posix-spawn.
See it's readme for more information.
@MidnightWonderer basically I missed it BUT not even their authors had noticed it. It may be useful as a fallback.
would be great to see a switch to posix-spawn
perhaps terrapin could help make the code simpler
https://github.com/thoughtbot/terrapin
It gives access to stdin and stderr - it returns this object (not documented in readme i believe)
https://github.com/thoughtbot/terrapin/blob/107e14937fd9aa710a79ceb3f304e168dae20496/lib/terrapin/command_line/output.rb#L1
not sure how to get access to that if process doesn't return 0, because terrapin raises an error in that case
you can pass in a logger to get ongoing output. not sure if that's out, or out+err. see "You can see what's getting run" in readme
Years passed, I am surprised the thread has some updates. I moved to puppeteer already, which is more mainstream. It's weird, as a contemplation, that probably all the old posters here don't care about the issue anymore.
@midnight-wonderer LOL, I just found your very helpful comment over on posix-spawn
, and it looks like this PR should be closed!