ART
ART copied to clipboard
Fix async service cleanup in LocalBackend.close()
Summary
- Fixed critical resource cleanup bug in
LocalBackend.close()that was causing service close methods to not execute properly when they were async - Made
LocalBackend.close()async to properly handle async service cleanup and match the baseBackendclass interface - Updated
__exit__method to handle both sync and async contexts appropriately
Problem
The close() method was synchronous but attempted to call close() on services that may have async close methods. This meant:
- Async close methods returned coroutine objects instead of executing
- Memory leaks and zombie processes accumulated over time
- Port conflicts occurred when restarting services
Solution
- Made
LocalBackend.close()async and added proper async/sync detection - Service close methods are now properly awaited if they're async coroutines
- Maintained backward compatibility for synchronous close methods
- Updated
__exit__method to handle event loop contexts correctly
Impact
This fix prevents:
- GPU memory leaks from improperly closed vLLM engines
- Zombie training processes
- Port conflicts on service restart
Critical for production stability and cost management in ML training environments.
Test Plan
- [x] All existing tests pass
- [x] Code formatting and linting checks pass
- [x] Manual verification that both sync and async service close methods work correctly