Fix 'Event loop is closed' errors during cleanup after crawling operations
Problem
The backend logs show numerous 'Event loop is closed' RuntimeError exceptions during crawling operations. While these errors are non-fatal and don't prevent the crawling/processing from completing successfully, they create noise in the logs and indicate improper resource management.
Current Behavior
- Crawling and code example extraction complete successfully
- Summaries are generated correctly using OpenAI's gpt-4.1-nano model
- Data is properly stored in the database
- After successful completion, cleanup tasks fail with 'Event loop is closed' errors
- Errors appear as:
Task exception was never retrievedwithAsyncClient.aclose()failing
Error Pattern
2025-09-17 12:59:43 | asyncio | ERROR | Task exception was never retrieved
future: <Task finished name='Task-5704' coro=<AsyncClient.aclose() done, defined at /venv/lib/python3.12/site-packages/httpx/_client.py:1978> exception=RuntimeError('Event loop is closed')>
Traceback (most recent call last):
File "/venv/lib/python3.12/site-packages/httpx/_client.py", line 1985, in aclose
await self._transport.aclose()
...
File "/usr/local/lib/python3.12/asyncio/base_events.py", line 545, in _check_closed
raise RuntimeError('Event loop is closed')
Root Cause Analysis
The Issue
-
Excessive client creation: A new OpenAI/LLM client is created for EACH summary generation operation
- Log pattern:
Creating LLM client for provider: openaiappears for every single summary - Each client creates an httpx AsyncClient that needs cleanup
- Log pattern:
-
Orphaned cleanup tasks: When the event loop closes (after crawling completes), there are still pending AsyncClient cleanup tasks
- These are fire-and-forget tasks that weren't properly awaited
- They try to execute after their event loop is already closed
-
Resource lifecycle mismatch: No connection pooling or client reuse strategy
Where to Look
Primary Investigation Areas
-
LLM Provider Service (
python/src/server/services/llm_provider_service.py)- Check how clients are created/destroyed
- Look for patterns like creating a new client per operation
- Should implement client reuse/pooling
-
Code Extraction Service (
python/src/server/services/crawling/code_extraction_service.py)- This is where summaries are generated during crawling
- Check how it calls the LLM provider service
- Look for loops that create multiple clients
-
httpx AsyncClient usage
- Search for AsyncClient creation patterns
- Check if clients are being properly closed with
async withor explicitawait client.aclose() - Files to check:
python/src/server/services/ollama/model_discovery_service.pypython/src/server/services/mcp_service_client.py
Log Evidence
- Errors occur between 12:59:30 - 12:59:46 during summary generation
- Pattern: Create client → Generate summary → Success logged → Cleanup error
- ~1291 total errors in one session, but all operations completed successfully
Suggested Fixes
Option 1: Client Pooling (Recommended)
- Implement a singleton or pool pattern for LLM clients
- Reuse the same OpenAI client across multiple operations
- Only create new clients when switching providers
Option 2: Proper Cleanup Coordination
- Use
async withcontext managers for all AsyncClient instances - Ensure all cleanup tasks are awaited before the event loop closes
- Consider using
asyncio.gather()withreturn_exceptions=Truefor cleanup
Option 3: Task Lifecycle Management
- Track all background tasks
- Cancel or await them before shutdown
- Use
asyncio.create_task()with proper task management
Example Problem Code Pattern
# Current problematic pattern (likely):
async def generate_summary(text):
client = create_openai_client() # New client each time!
result = await client.generate(text)
# client.aclose() might be scheduled but not awaited
return result
# Should be:
class SummaryService:
def __init__(self):
self.client = create_openai_client() # Reuse same client
async def generate_summary(self, text):
return await self.client.generate(text)
async def cleanup(self):
await self.client.aclose() # Explicit cleanup
Testing
To reproduce:
- Start a crawl operation on any documentation site
- Watch logs for
Creating LLM clientmessages - After crawling completes, observe the
Event loop is closederrors
To verify fix:
- Errors should not appear in logs after crawling
- Client creation messages should be minimal
- All operations should still complete successfully
Impact
- Severity: Low (operations work, but logs are noisy)
- Type: Resource Management / Cleanup
- Components: LLM Provider Service, Code Extraction, AsyncClient handling
Note: These errors do not affect functionality - all crawling, processing, and storage operations complete successfully. This is purely a cleanup/resource management issue.