burr icon indicating copy to clipboard operation
burr copied to clipboard

Pool based async persister

Open gamarin2 opened this issue 7 months ago • 1 comments

This PR adds a way to instantiate the async postgres persister with a connection pool. The goal is to prevent a bug in parallelism: when the user implements async versions of map-reduce classes (i.e. set is_async to True), burr calls asyncio.gather instead of threads, meaning we can't reuse the connection of the async persister.

One solution is to update b_asyncpg to use a connection pool instead of a single connection and add a copy method to it, meaning that when burr cascades to get persisters for each sub-app, we get a new connection from the pool each time.

Changes

  • Added a way to instantiate AsyncPostgresPersister with a pool instead of a connection
  • Added a copy method that favors using the pool in copied instances if self.pool is not None
  • Kept everything backwards compatible
  • Also added missing async related docs

How I tested this

Manual testing

Notes

So far I only implemented a proposed solution in b_asyncpg, but the issue is still present in other async persister implementations. Once we agree on a definitive solution, I can update them as well.

Open question:

  • Do we really want to keep things backwards compatible and let user instantiate with a simple connection? In most scenarios a pool is safer, especially in async context

Checklist

  • [ ] PR has an informative and human-readable title (this will be pulled into the release notes)
  • [ ] Changes are limited to a single goal (no scope creep)
  • [ ] Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • [ ] Any change in functionality is tested
  • [ ] New functions are documented (with a description, list of inputs, and expected output)
  • [ ] Placeholder code is flagged / future TODOs are captured in comments
  • [ ] Project documentation has been updated if adding/changing functionality.

[!IMPORTANT] Add connection pool support to AsyncPostgreSQLPersister for better parallelism handling in async contexts.

  • Behavior:
    • AsyncPostgreSQLPersister can now be instantiated with a connection pool using from_values() with use_pool=True.
    • Adds copy() method to AsyncPostgreSQLPersister to create new instances using the pool.
    • Updates _get_connection() and _release_connection() to handle pooled connections.
  • Documentation:
    • Updates actions.rst, parallelism.rst, and sync-vs-async.rst to include information about using connection pools with async persisters.
  • Misc:
    • Maintains backward compatibility with direct connections.
    • Fixes minor formatting issues in b_asyncpg.py.

This description was created by Ellipsis for c4d6293bb3c9dadbd9ea387fab6e5b1687167aaf. You can customize this summary. It will automatically update as commits are pushed.

gamarin2 avatar May 06 '25 17:05 gamarin2

I pushed fixes for the pre-commit hooks. I'd still like to know your opinion on the following @elijahbenizzy : I have kept things backwards compatible, but should we really let people instantiate async persisters with direct connections? This means people that use this will still run into the issue when using parallelism. In async setting, I would argue it'd be better to "force" connection pools to be used

Regarding tests, there are already some in the test folder that I can draw inspiration from, but if you have something specific in mind lmk

gamarin2 avatar May 19 '25 18:05 gamarin2

I pushed fixes for the pre-commit hooks. I'd still like to know your opinion on the following @elijahbenizzy : I have kept things backwards compatible, but should we really let people instantiate async persisters with direct connections? This means people that use this will still run into the issue when using parallelism. In async setting, I would argue it'd be better to "force" connection pools to be used

Regarding tests, there are already some in the test folder that I can draw inspiration from, but if you have something specific in mind lmk

@gamarin2 Ok, I'm not sure this is "backwards compatible" as much as "preserves prior behavior". E.G. will anyone's code break if this is released and the default is set to True rather than False for use_pool?

I don't think so, although you have a better sense.

If not: let's set the default to the optimal behavior and save people a headache. If so: let's think more carefully about this -- we're still in 0.X.X so we have the right, but we want to be smart about this.

@jernejfrank -- any thoughts?

elijahbenizzy avatar May 26 '25 22:05 elijahbenizzy

Mergine despite build failures @jernejfrank has agreed to fix!

elijahbenizzy avatar Jul 21 '25 04:07 elijahbenizzy