graphrag [Bug]: RuntimeError: threads can only be started once - Celery worker compatibility issue in fnllm utils

Do you need to file an issue?

[x] I have searched the existing issues and this bug is not already filed.
[x] My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
[x] I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the bug

Description:

GraphRAG fails in Celery worker processes with RuntimeError: threads can only be started once when executing queries that use embedding models.

Root Cause

The run_coroutine_sync() function in graphrag/language_model/providers/fnllm/utils.py is not fork-safe. When Celery creates worker processes by forking:

Process Fork: Celery forks the main process to create workers
Thread Inheritance: Child processes inherit parent's global thread objects (_thr, _loop, _pid)
Dead Threads: Inherited thread objects exist but the actual threads are not running
Restart Failure: Code attempts to call _thr.start() on a dead thread object
Runtime Error: Python raises "threads can only be started once" because threads have single-use lifecycle

Additional Files to Reference:

graphrag/language_model/providers/fnllm/utils.py (lines 112-134)
graphrag/language_model/providers/fnllm/models.py (where run_coroutine_sync is called)

Impact:

GraphRAG queries fail in Celery workers
Any async operation using FNLLM models crashes
Production deployments using Celery are broken

Steps to reproduce

Expected Behavior

No response

GraphRAG Config Used

No Changes in Config

Logs and screenshots

No response

Additional Information

GraphRAG Version: 2.3.0
Operating System: Windows
Python Version: 3.11
Related Issues:

Jun 11 '25 19:06 droideronline

Please let me know if you see this issue with the LiteLLM provider that was introduced in 2.6.0. We will be removing fnllm entirely for v3, so it would be helpful to know if this is still relevant with LiteLLM.

Nov 17 '25 19:11 natoverse

Please let me know if you see this issue with the LiteLLM provider that was introduced in 2.6.0. We will be removing fnllm entirely for v3, so it would be helpful to know if this is still relevant with LiteLLM.

Unfortunately the issue prevails with LiteLLM as well.

Nov 18 '25 08:11 droideronline