chroma icon indicating copy to clipboard operation
chroma copied to clipboard

[ENH] Connection pool FD leak v2

Open tazarov opened this issue 10 months ago • 4 comments

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • Using weakrefs in pools' connections set.

Test plan

How are these changes tested?

  • [x] Tests pass locally with pytest for python, yarn test for js

Documentation Changes

N/A

Refs

  • https://peps.python.org/pep-0567/

tazarov avatar Apr 14 '24 06:04 tazarov

  • #2014 Graphite 👈
  • main

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @tazarov and the rest of your teammates on Graphite Graphite

tazarov avatar Apr 14 '24 06:04 tazarov

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • [ ] Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • [ ] Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • [ ] If appropriate, are there adequate property based tests?
  • [ ] If appropriate, are there adequate unit tests?
  • [ ] Should any logging, debugging, tracing information be added or removed?
  • [ ] Are error messages user-friendly?
  • [ ] Have all documentation changes needed been made?
  • [ ] Have all non-obvious changes been commented?

System Compatibility

  • [ ] Are there any potential impacts on other parts of the system or backward compatibility?
  • [ ] Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • [ ] Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

github-actions[bot] avatar Apr 14 '24 06:04 github-actions[bot]

Needs a test for to verify the file descriptor are being closed.

Tests might be flaky due to weakrefs relying on GC.

tazarov avatar Apr 14 '24 06:04 tazarov

Can we add a screenshot showing the FD count going down?

@HammadB here's a short video to demo the PR:

https://www.loom.com/share/0d5fe43c9183439f9261381651b35ec5?sid=8fd8330f-6db9-4b5d-be23-637888314d5a

tazarov avatar Apr 19 '24 18:04 tazarov

May I ask when it will close

pinsisong avatar Jul 22 '24 07:07 pinsisong

do you know why this was happening? If I'm understanding correctly this makes it more likely for connections to be GCed but may not fix the underlying bug it seems like this may result in a new connection being created for each transaction? just want to understand more how this works since I ran into a similar issue a few months ago and never fully tracked it down

The root of the issue is that thread locals which we use for tracking connections do not work (think: deterministically) well with async contexts, which FastAPI uses under the hood. This is what https://peps.python.org/pep-0567/ attempts to solve with contextvars. The PerThreadConnection pool leaks connections as the thread locals get recycled by asyncio (occasionally). However, we keep references to the connection in the connections field, which makes things leak.

I had a prior impl with contextvars, but that introduces challenges of its own - the contextvars should ideally be defined at the top of the call stack in FastAPI (this is what Depends() tries to solve) and instead of PerThread, we should have a proper connection pull with checkin/checkout mechanics for connections. This PR is not aimed at solving this issue at its core but to ensure system stability in the short term.

tazarov avatar Jul 25 '24 14:07 tazarov

I see, thank you for the great explanation. :)

codetheweb avatar Jul 25 '24 16:07 codetheweb