amaranth icon indicating copy to clipboard operation
amaranth copied to clipboard

"asynchronous generator is already running" when using simulator process on Python <= 3.12

Open agrif opened this issue 1 month ago • 8 comments

(Originally posted to #1590, but in a new issue for visibility and tracking.)

I'm getting RuntimeErrors during simulation when I use a background process that does not complete by the time the simulation ends. Here is a small example:

import amaranth as am
import amaranth.sim

async def deadline_checker(ctx):
    await ctx.tick().repeat(100)
    raise RuntimeError('deadline')

m = am.Module()
counter = am.Signal(8)
m.d.sync += counter.eq(counter + 1)

sim = am.sim.Simulator(m)
sim.add_process(deadline_checker)

sim.run()

Under Python 3.13.5, this works fine. With Python 3.12.8, I get

Exception ignored in: <coroutine object deadline_checker at 0x7f433c8c9f00>
Traceback (most recent call last):
  File "/home/agrif/devel/fpga/simulator_aclose.py", line 7, in deadline_checker
  File "/home/agrif/devel/fpga/local/nara-env-312/lib/python3.12/site-packages/amaranth/sim/_async.py", line 414, in repeat
RuntimeError: aclose(): asynchronous generator is already running

While reading #1590, I was confused by why aclose(...) is grumpy, so I dug around. I don't get much from this, but maybe someone else will (or me, later).

agrif avatar Nov 22 '25 23:11 agrif

Please research this and send a PR! I also wonder why this wasn't caught by tests...

whitequark avatar Nov 22 '25 23:11 whitequark

@whitequark is there a test or small example that generates the warning fixed in #1590? I can't seem to get it even after removing aclose(), but I'm also not sure how much asyncio to mix in and where.

agrif avatar Nov 23 '25 17:11 agrif

Oh, I remember now what happened: the use downstream in Glasgow ran into issues. I haven't been able to find which Glasgow commit (if any) corresponds to this fix; I think probably the easiest way to do this is to run pdm test in Glasgow after reverting this commit. Sorry!

whitequark avatar Nov 24 '25 05:11 whitequark

This one is a mess. I don't have a solution, but I can at least make what I've learned public.

Async Generator Hooks

  • aclose() is a method on async generators to make sure any clean-up code in the generator runs, even if the generator is not run to completion. Non-async generators do this automatically when they're garbage collected. Async generators may await during cleanup, so they get a different solution.
  • Async generators call sys.get_asyncgen_hooks() when first iterated, and store it. When they are GC'd, they call hooks.finalizer(self). Event loops are expected to register themselves with sys.set_asyncgen_hooks(), and use the finalizer to schedule the generator's aclose() in an async context.

Adding Hooks to Simulator

Initially I thought Amaranth's simulator just needs to implement these hooks. This solution (by itself) doesn't work on Python <= 3.12:

  • Async generators can only have one outstanding operation at a time. If code is already waiting on __next__(), then calling aclose() (or asend(), etc.) will fail:
    RuntimeError: aclose(): asynchronous generator is already running
    
  • If a process coroutine is GC'd while waiting on an async generator, that generator will stay in the 'running' state forever, even inside its own finalizer hook. This makes it impossible to ever call aclose().
  • If the simulator explicitly cancels the process and testbench coroutines, this might knock the generator out of the 'running' state. Doing this might need an API change for Simulator. I need to look into this.

In contrast, Python >= 3.13 attempts to close async generators automatically, without using the hooks. (This commit and probably more.) In these versions, everything seems to just work fine with no changes. I still can't reproduce the original warning, though, so I'm not sure.

Possible solutions so far

Not a huge fan of any of these:

  • Just don't use async generators, and hope users don't either.
  • Use async generators but accept that clean-up code inside may never run.
  • Implement the hooks and explicitly cancel running coroutines rather than let them fall out of scope. Forgetting to cancel will mean clean-up code in async generators might never run.

I'm still looking for a clean solution, but what I've learned so far does not spark joy.

agrif avatar Nov 26 '25 01:11 agrif

  • Just don't use async generators, and hope users don't either.

I may be mixing up some terminology, but doesn't the new simulator's design require the use of async generators?

  • Use async generators but accept that clean-up code inside may never run.

I think it's OK to say "if you want async generators to be cleaned up properly you must use Python 3.13+".

whitequark avatar Nov 26 '25 03:11 whitequark

By "async generators" I mean functions that use both "await" and "yield". Python calls those "async generators", while other async functions are "coroutines" even if internally I think they're still implemented as (non-async) generators.

Right now, Amaranth uses only two of these.

I think it's OK to say "if you want async generators to be cleaned up properly you must use Python 3.13+".

I agree, that sounds like the best tradeoff right now. And probably the simulator should register a finalizer hook to attempt aclose, since that's what event loops are expected to do.

agrif avatar Nov 26 '25 03:11 agrif

I agree, that sounds like the best tradeoff right now. And probably the simulator should register a finalizer hook to attempt aclose, since that's what event loops are expected to do.

Agreed. Will you be able to implement it?

whitequark avatar Nov 26 '25 03:11 whitequark

Yes.

I still can't reproduce the original warning, but I feel comfortable moving aclose into a finalizer. As long as it's called at some point in its lifetime, that should prevent the warning.

agrif avatar Nov 26 '25 03:11 agrif