trio icon indicating copy to clipboard operation
trio copied to clipboard

func never called after trio.to_thread.run_sync(func, obj) (only for trio >= 0.16)

Open paugier opened this issue 3 years ago • 4 comments

I maintain a Python package https://foss.heptapod.net/fluiddyn/fluidimage using Trio which is working correctly with trio 0.15. Unfortunately, I started to have a buggy behavior with trio>=0.16.

A function given to trio.to_thread.run_sync seems to be never called, so that the program never ends. Of course, this happens only in particular situations that I was not able to reproduce with very simple code.

The situations in which there is the problem seem to be related with running the code multiple times first in the main process and then in other processes created with multiprocessing. These situations are mostly encountered in testing, but I had to avoid trio > 0.15, which is now a bit annoying for example to support newer versions of Python. So now I need to really understand what is the problem.

The call of trio.to_thread.run_sync which never lead to a call of the synchronous function looks like this (https://foss.heptapod.net/fluiddyn/fluidimage/-/blob/branch/default/fluidimage/executors/exec_async.py#L238)

    async def async_run_work_io(self, work):
        """Is destined to be started with a "trio.start_soon".
        obj = ...
        ret = await trio.to_thread.run_sync(work.func_or_cls, obj)
  • async_run_work_io is called in def_async_func_work_io (https://foss.heptapod.net/fluiddyn/fluidimage/-/blob/branch/default/fluidimage/executors/exec_async.py#L162)

  • def_async_func_work_io is called in start_async_works (https://foss.heptapod.net/fluiddyn/fluidimage/-/blob/branch/default/fluidimage/executors/exec_async.py#L97), where the nursery is opened.

I looked at https://trio.readthedocs.io/en/stable/history.html#trio-0-16-0-2020-06-10 to try to understand the issue. Some changes related to trio.to_thread.run_sync are mentioned but it does not help me to fix the problem.

Maybe you could have some clues?

paugier avatar Sep 22 '21 08:09 paugier

What OS is this? Does it change anything if you do multiprocessing.set_start_method("spawn") at the top of your program, before doing anything else?

njsmith avatar Sep 22 '21 08:09 njsmith

This is with Linux (Ubuntu 20.04).

I can't really try with "spawn" because multiprocessing.set_start_method("spawn") leads to an AttributeError ('Can't pickle local object 'MultiExecutorAsync.launch_process.<locals>.init_and_compute').

paugier avatar Sep 22 '21 08:09 paugier

Gotcha. So this is almost certainly a bad interaction between fork and threads, similar to #1614. In particular, I think you're hitting a lack of fork-safety inside trio's thread cache. To confirm, try adding this janky workaround at the beginning of your program and see if it fixes your problem?

import os, trio

def afterfork():
    trio._core._thread_cache.THREAD_CACHE._idle_workers.clear()

os.register_at_fork(after_in_child=afterfork)

Also, I'm surprised you're not running into the other issues mentioned in #1614 – in particular, if you use multiprocessing in fork mode from inside trio.run, and then try to call trio.run again, that normally gives an error. I guess you must be using multiple calls to trio.run?

njsmith avatar Sep 24 '21 09:09 njsmith

Yes, I confirm that it fixes this problem.

With this workaround, our test suite passes (with a PytestUnhandledThreadExceptionWarning related to trio that I don't understand).

Could it be safe to just add this in our package?

There are indeed multiple calls to trio.run (not nested though) so I don't understand why I don't get #1614. It might be because there is maximum 1 call to trio.run per processes launched with multiprocessing ?

paugier avatar Sep 24 '21 21:09 paugier