[Bug]: RuntimeError: threads can only be started once - Celery worker compatibility issue in fnllm utils
Do you need to file an issue?
- [x] I have searched the existing issues and this bug is not already filed.
- [x] My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
- [x] I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.
Describe the bug
Description:
GraphRAG fails in Celery worker processes with RuntimeError: threads can only be started once when executing queries that use embedding models.
Root Cause
The run_coroutine_sync() function in graphrag/language_model/providers/fnllm/utils.py is not fork-safe. When Celery creates worker processes by forking:
- Process Fork: Celery forks the main process to create workers
- Thread Inheritance: Child processes inherit parent's global thread objects (
_thr,_loop,_pid) - Dead Threads: Inherited thread objects exist but the actual threads are not running
- Restart Failure: Code attempts to call
_thr.start()on a dead thread object - Runtime Error: Python raises "threads can only be started once" because threads have single-use lifecycle
Additional Files to Reference:
graphrag/language_model/providers/fnllm/utils.py(lines 112-134)graphrag/language_model/providers/fnllm/models.py(whererun_coroutine_syncis called)
Impact:
- GraphRAG queries fail in Celery workers
- Any async operation using FNLLM models crashes
- Production deployments using Celery are broken
Steps to reproduce
Expected Behavior
No response
GraphRAG Config Used
No Changes in Config
Logs and screenshots
No response
Additional Information
- GraphRAG Version: 2.3.0
- Operating System: Windows
- Python Version: 3.11
- Related Issues:
Please let me know if you see this issue with the LiteLLM provider that was introduced in 2.6.0. We will be removing fnllm entirely for v3, so it would be helpful to know if this is still relevant with LiteLLM.
Please let me know if you see this issue with the LiteLLM provider that was introduced in 2.6.0. We will be removing fnllm entirely for v3, so it would be helpful to know if this is still relevant with LiteLLM.
Unfortunately the issue prevails with LiteLLM as well.