graphrag icon indicating copy to clipboard operation
graphrag copied to clipboard

[Bug]: RuntimeError: threads can only be started once - Celery worker compatibility issue in fnllm utils

Open droideronline opened this issue 6 months ago • 2 comments

Do you need to file an issue?

  • [x] I have searched the existing issues and this bug is not already filed.
  • [x] My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • [x] I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the bug

Description:

GraphRAG fails in Celery worker processes with RuntimeError: threads can only be started once when executing queries that use embedding models.

Root Cause

The run_coroutine_sync() function in graphrag/language_model/providers/fnllm/utils.py is not fork-safe. When Celery creates worker processes by forking:

  1. Process Fork: Celery forks the main process to create workers
  2. Thread Inheritance: Child processes inherit parent's global thread objects (_thr, _loop, _pid)
  3. Dead Threads: Inherited thread objects exist but the actual threads are not running
  4. Restart Failure: Code attempts to call _thr.start() on a dead thread object
  5. Runtime Error: Python raises "threads can only be started once" because threads have single-use lifecycle

Additional Files to Reference:

  • graphrag/language_model/providers/fnllm/utils.py (lines 112-134)
  • graphrag/language_model/providers/fnllm/models.py (where run_coroutine_sync is called)

Impact:

  • GraphRAG queries fail in Celery workers
  • Any async operation using FNLLM models crashes
  • Production deployments using Celery are broken

Steps to reproduce

Expected Behavior

No response

GraphRAG Config Used

No Changes in Config

Logs and screenshots

No response

Additional Information

  • GraphRAG Version: 2.3.0
  • Operating System: Windows
  • Python Version: 3.11
  • Related Issues:

droideronline avatar Jun 11 '25 19:06 droideronline

Please let me know if you see this issue with the LiteLLM provider that was introduced in 2.6.0. We will be removing fnllm entirely for v3, so it would be helpful to know if this is still relevant with LiteLLM.

natoverse avatar Nov 17 '25 19:11 natoverse

Please let me know if you see this issue with the LiteLLM provider that was introduced in 2.6.0. We will be removing fnllm entirely for v3, so it would be helpful to know if this is still relevant with LiteLLM.

Unfortunately the issue prevails with LiteLLM as well.

droideronline avatar Nov 18 '25 08:11 droideronline