graphrag icon indicating copy to clipboard operation
graphrag copied to clipboard

fix: make fnllm utils fork-safe for Celery workers

Open droideronline opened this issue 6 months ago • 2 comments

Detect process forks and recreate thread resources to prevent 'RuntimeError: threads can only be started once' in worker processes.

  • Add PID tracking for fork detection
  • Safe cleanup of inherited resources
  • Fresh event loop creation per process

Problem

GraphRAG fails in Celery worker processes with RuntimeError: threads can only be started once when executing queries that use embedding models. This occurs because the current run_coroutine_sync() implementation is not fork-safe.

Root Cause

When Celery creates worker processes by forking:

  1. Child processes inherit parent's global thread objects (_thr, _loop, _pid)
  2. Inherited thread objects exist but the actual threads are not running (threads don't survive forks)
  3. Code attempts to call _thr.start() on a dead thread object
  4. Python raises "threads can only be started once" because threads have single-use lifecycle

Solution

This PR implements fork detection and safe resource recreation:

  • Fork Detection: Track process ID to detect when code runs in a forked child process
  • Safe Cleanup: Use contextlib.suppress() to safely cleanup inherited resources
  • Fresh Resources: Create new event loop and thread specific to the child process
  • Process Isolation: Each forked process gets its own async infrastructure

Changes Made

Modified Files:

  • graphrag/language_model/providers/fnllm/utils.py

Related Issues

#1974

Proposed Changes

  1. Added imports: os and contextlib for PID tracking and safe cleanup
  2. Enhanced run_coroutine_sync(): Added fork detection logic
  3. Process tracking: Compare current PID with stored PID to detect forks
  4. Resource cleanup: Safely stop inherited event loops before creating new ones
  5. Thread recreation: Create fresh thread and event loop for each process

Checklist

  • [x] I have tested these changes locally.
  • [x] I have reviewed the code changes.
  • [ ] I have updated the documentation (if necessary).
  • [ ] I have added appropriate unit tests (if applicable).

droideronline avatar Jun 11 '25 19:06 droideronline

@droideronline please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree

droideronline avatar Jun 11 '25 19:06 droideronline

@natoverse - Could you please review this PR when you have some time. thanks

droideronline avatar Jun 16 '25 09:06 droideronline