flashinfer fix(jit): add filelock timeout report

Apr 01 '25 08:04 zobinHuang

I pushed a commit for a case to reproduce the issue that you can try: https://github.com/flashinfer-ai/flashinfer/pull/993/commits/58e83cfdae61aeade02c37d47460af6cad8f3220

Apr 02 '25 21:04 yzh119

Per discussion w/ @abcdabcd987 , we think the deadlock issue might be coming from NFS (if users's cache directory was located in NFS), which is not compliant with POSIX and might influence filelock behavior.

We will create another PR to fix the deadlock issue, and uses tmpfs for filelock instead of ~/.cache directory.

May 16 '25 19:05 yzh119

@yzh119 HI, Is this issue related with my stack? version is 0.2.5, trtllm version 0.20.0

stack stucked:

Thread 3463494 (idle): "MainThread"
    acquire (/usr/local/lib/python3.10/dist-packages/filelock/_api.py:344)
    __enter__ (/usr/local/lib/python3.10/dist-packages/filelock/_api.py:376)
    load_cuda_ops (/usr/local/lib/python3.10/dist-packages/flashinfer/jit/core.py:134)
    get_norm_module (/usr/local/lib/python3.10/dist-packages/flashinfer/norm.py:36)
    get_module_attr (/usr/local/lib/python3.10/dist-packages/flashinfer/norm.py:50)
    _rmsnorm (/usr/local/lib/python3.10/dist-packages/flashinfer/norm.py:98)
    rmsnorm (/usr/local/lib/python3.10/dist-packages/flashinfer/norm.py:86)
    flashinfer_rmsnorm (/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_torch/custom_ops/flashinfer_custom_ops.py:47)
    wrapped_fn (/usr/local/lib/python3.10/dist-packages/torch/_library/custom_ops.py:367)
    _fn (/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py:838)
    inner (/usr/local/lib/python3.10/dist-packages/torch/_compile.py:51)
    backend_impl (/usr/local/lib/python3.10/dist-packages/torch/_library/custom_ops.py:335)
    __call__ (/usr/local/lib/python3.10/dist-packages/torch/_ops.py:756)
    __call__ (/usr/local/lib/python3.10/dist-packages/torch/_library/custom_ops.py:671)
    forward (/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_torch/modules/rms_norm.py:43)
    _call_impl (/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1762)
    _wrapped_call_impl (/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1751)
    forward (/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_torch/models/modeling_llama.py:522)
    _call_impl (/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1762)
    _wrapped_call_impl (/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1751)
    forward (/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_torch/models/modeling_llama.py:790)
    _call_impl (/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1762)
    _wrapped_call_impl (/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1751)
    forward (/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_torch/models/modeling_utils.py:517)
    model_forward (/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_torch/pyexecutor/model_engine.py:2000)
    _forward_step (/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_torch/pyexecutor/model_engine.py:2012)
    forward (/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_torch/pyexecutor/model_engine.py:1962)
    wrapper (/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_torch/utils.py:66)
    decorate_context (/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py:116)
    warmup (/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_torch/pyexecutor/model_engine.py:679)
    __init__ (/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py:245)
    create_py_executor_instance (/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_torch/pyexecutor/_util.py:446)
    create_py_executor (/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor_creator.py:190)
    _create_engine (/usr/local/lib/python3.10/dist-packages/tensorrt_llm/executor/worker.py:126)
    __init__ (/usr/local/lib/python3.10/dist-packages/tensorrt_llm/executor/worker.py:128)
    worker_main (/usr/local/lib/python3.10/dist-packages/tensorrt_llm/executor/worker.py:698)
    wrapper (/usr/local/lib/python3.10/dist-packages/tensorrt_llm/llmapi/utils.py:35)
    call (/usr/local/lib/python3.10/dist-packages/mpi4py/futures/_core.py:844)
    server_exec (/usr/local/lib/python3.10/dist-packages/mpi4py/futures/_core.py:865)
    server_main_comm (/usr/local/lib/python3.10/dist-packages/mpi4py/futures/_core.py:1215)
    server_main_spawn (/usr/local/lib/python3.10/dist-packages/mpi4py/futures/_core.py:1222)
    server_main (/usr/local/lib/python3.10/dist-packages/mpi4py/futures/_core.py:1254)
    main (/usr/local/lib/python3.10/dist-packages/mpi4py/futures/server.py:11)
    <module> (/usr/local/lib/python3.10/dist-packages/mpi4py/futures/server.py:15)
    _run_code (/usr/lib/python3.10/runpy.py:86)
    _run_module_as_main (/usr/lib/python3.10/runpy.py:196)
Thread 3463624 (idle): "Thread-1 (_read_thread)"
    _recv_msg (/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_worker/subproc_pool.py:55)
    _read_thread (/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_worker/subproc_pool.py:191)
    run (/usr/lib/python3.10/threading.py:953)
    _bootstrap_inner (/usr/lib/python3.10/threading.py:1016)
    _bootstrap (/usr/lib/python3.10/threading.py:973)
Thread 3463781 (idle): "Thread-2"
    wait (/usr/lib/python3.10/threading.py:324)
    wait (/usr/lib/python3.10/threading.py:607)
    run (/usr/local/lib/python3.10/dist-packages/tqdm/_monitor.py:60)
    _bootstrap_inner (/usr/lib/python3.10/threading.py:1016)
    _bootstrap (/usr/lib/python3.10/threading.py:973)

Jul 23 '25 08:07 foreverlms

Hi @foreverlms the deadlock issue have resolved after https://github.com/flashinfer-ai/flashinfer/issues/1064 led by @abcdabcd987 , but it's not available in v0.2.5 Would you mind upgrading to a later version of flashinfer?

Jul 23 '25 08:07 yzh119

Hi @foreverlms the deadlock issue have resolved after #1064 led by @abcdabcd987 , but it's not available in v0.2.5 Would you mind upgrading to a later version of flashinfer?

Hi Zihao, would you mind that explain why there is dead-lock? It seems related with JIT. I am using trt-llm, and for the past a week the demo program works fine with flashinfer as the RMSNorm backend. But suddenly from last night, the demo script will hang. I spent a few hours to figure out why and finally realized it's related with flashinfer. So is this some thing like resources limitation of JIT caching? The trtllm v0.20.0 requires flash infer 0.2.5

Whatever I will upgrade flashinfer to have a try.

Jul 23 '25 08:07 foreverlms

so what is the root cause of deadlock, can you elaborate on it? @yzh119

Aug 28 '25 08:08 Jin-Chuan