wicked_pdf icon indicating copy to clipboard operation
wicked_pdf copied to clipboard

WIP: Use POSIX::Spawn::Child instead of #popen3 to avoid deadlocks

Open nachokb opened this issue 6 years ago • 10 comments

Problem

We are seeing No live threads left. Deadlock? in production a lot. This is not easy to reproduce, and it only happens under load.

After upgrading to Ruby 2.4.2 and Rails 5.1 we started seeing this much more often.

Cause

Incorrect use of #popen3 (whose API is badly designed). Particularly,

Open3.popen3(*command) do |_stdin, _stdout, stderr|
  stderr.read
end

This produces a deadlock in case the command filled the whole stdout buffer. In that case, the subprocess is waiting for that buffer to be consumed, and the parent process is waiting for stderr's buffer to have content.

Ruby core team's position is that this should be handled with IO.select (check ref 1).

Solution

First alternative would be to use IO.select. But that sounds error-prone to me.

I decided to use the posix-spawn gem (particularly POSIX::Spawn::Child which handles this correctly). This behavior is optional (but default). This may break Windows compatibility (check Notes)

References

  1. https://bugs.ruby-lang.org/issues/9082
  2. nicer explanation: http://coldattic.info/post/63/

Notes

  1. I'm not sure about posix-spawn Windows support so I decided to make this behavior the default, but optional. Setting WICKED_POPEN env variable to ruby uses the standard library's Open3 instead. Gem dependencies still install posix-spawn, so it may still fail on Windows.
  2. I couldn't make Travis run tests successfully yet. Not even master as it is today upstream.
  3. This is a WIP. My intention is to bring this to your attention. Any feedback is welcome.

nachokb avatar May 11 '18 03:05 nachokb

Thank you for this. It sounds great. Have you been running this fork in your own apps and seen an improvement?

For an introductory transition period, I think it's be great if we could add an option to WickedPdf.config to opt into this, rather than an ENV variable, and making this option the default in a later release.

I just fixed up the test suite. I had a dependency that wasn't pinned to a version update. If you rebase, you should be able to run the tests locally or on Travis-CI.

unixmonkey avatar May 21 '18 22:05 unixmonkey

sorry I missed the notification; I'll do the config change in a few hours (thanks for the tip, I honestly didn't know);

and dependencies versions… it happens

nachokb avatar Jun 12 '18 20:06 nachokb

sorry, I didn't answer: yes, I've been using it in production for ~20 days and no deadlocks (had 1 - 2 per week before)

nachokb avatar Jun 12 '18 20:06 nachokb

had 1 - 2 per week before

actually, historically we were seeing 1-2 per month (we process ~1000 PDFs a month), say ~0.2%; after an upgrade (Rails 4.1 -> 5.1, Ruby 2.1 -> 2.4) and a ~20% growth we started seeing ~1-2 per week, say ~0.6% (every once in a while we had weird spikes, which is the reason that I had to do this)

nachokb avatar Jun 12 '18 20:06 nachokb

Just curious why not use Open3.capture3 as that ruby-lang ticket suggested?

allenwu1973 avatar Aug 07 '18 03:08 allenwu1973

@allenwu1973 there are other advantages using posix-spawn.
See it's readme for more information.

midnight-wonderer avatar Sep 24 '18 16:09 midnight-wonderer

@MidnightWonderer basically I missed it BUT not even their authors had noticed it. It may be useful as a fallback.

nachokb avatar Oct 04 '18 01:10 nachokb

would be great to see a switch to posix-spawn

perhaps terrapin could help make the code simpler

https://github.com/thoughtbot/terrapin

It gives access to stdin and stderr - it returns this object (not documented in readme i believe)

https://github.com/thoughtbot/terrapin/blob/107e14937fd9aa710a79ceb3f304e168dae20496/lib/terrapin/command_line/output.rb#L1

not sure how to get access to that if process doesn't return 0, because terrapin raises an error in that case

you can pass in a logger to get ongoing output. not sure if that's out, or out+err. see "You can see what's getting run" in readme

jjb avatar Feb 22 '22 22:02 jjb

Years passed, I am surprised the thread has some updates. I moved to puppeteer already, which is more mainstream. It's weird, as a contemplation, that probably all the old posters here don't care about the issue anymore.

midnight-wonderer avatar Mar 09 '22 04:03 midnight-wonderer

@midnight-wonderer LOL, I just found your very helpful comment over on posix-spawn, and it looks like this PR should be closed!

pboling avatar Mar 09 '24 00:03 pboling