chroma
chroma copied to clipboard
[ENH] Connection pool FD leak v2
Description of changes
Summarize the changes made by this PR.
- Improvements & Bug fixes
- Using weakrefs in pools'
connections
set.
- Using weakrefs in pools'
Test plan
How are these changes tested?
- [x] Tests pass locally with
pytest
for python,yarn test
for js
Documentation Changes
N/A
Refs
- https://peps.python.org/pep-0567/
This stack of pull requests is managed by Graphite. Learn more about stacking.
Join @tazarov and the rest of your teammates on Graphite
Reviewer Checklist
Please leverage this checklist to ensure your code review is thorough before approving
Testing, Bugs, Errors, Logs, Documentation
- [ ] Can you think of any use case in which the code does not behave as intended? Have they been tested?
- [ ] Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
- [ ] If appropriate, are there adequate property based tests?
- [ ] If appropriate, are there adequate unit tests?
- [ ] Should any logging, debugging, tracing information be added or removed?
- [ ] Are error messages user-friendly?
- [ ] Have all documentation changes needed been made?
- [ ] Have all non-obvious changes been commented?
System Compatibility
- [ ] Are there any potential impacts on other parts of the system or backward compatibility?
- [ ] Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?
Quality
- [ ] Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)
Needs a test for to verify the file descriptor are being closed.
Tests might be flaky due to weakrefs relying on GC.
Can we add a screenshot showing the FD count going down?
@HammadB here's a short video to demo the PR:
https://www.loom.com/share/0d5fe43c9183439f9261381651b35ec5?sid=8fd8330f-6db9-4b5d-be23-637888314d5a
May I ask when it will close
do you know why this was happening? If I'm understanding correctly this makes it more likely for connections to be GCed but may not fix the underlying bug it seems like this may result in a new connection being created for each transaction? just want to understand more how this works since I ran into a similar issue a few months ago and never fully tracked it down
The root of the issue is that thread locals which we use for tracking connections do not work (think: deterministically) well with async contexts, which FastAPI uses under the hood. This is what https://peps.python.org/pep-0567/ attempts to solve with contextvars. The PerThreadConnection pool leaks connections as the thread locals get recycled by asyncio (occasionally). However, we keep references to the connection in the connections
field, which makes things leak.
I had a prior impl with contextvars, but that introduces challenges of its own - the contextvars should ideally be defined at the top of the call stack in FastAPI (this is what Depends()
tries to solve) and instead of PerThread, we should have a proper connection pull with checkin/checkout mechanics for connections. This PR is not aimed at solving this issue at its core but to ensure system stability in the short term.
I see, thank you for the great explanation. :)