sglang icon indicating copy to clipboard operation
sglang copied to clipboard

[Bug] Mooncake memory registration failed

Open CSEEduanyu opened this issue 7 months ago • 3 comments

Checklist

  • [x] 1. I have searched related issues but cannot get the expected help.
  • [x] 2. The bug has not been fixed in the latest version.
  • [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • [x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • [x] 5. Please use English, otherwise it will be closed.

Describe the bug

srt/disaggregation/mooncake/transfer_engine.py line 36, in register raise RuntimeError("Mooncake memory registration failed.") E0511 07:42:36.355509 10240 rdma_context.cpp:198] Failed to register memory 0x2e1cbac200: Bad address [14]

Reproduction

--disaggregation-mode prefill

Environment

Python: 3.10.12 (main, Feb 4 2025, 14:57:36) [GCC 11.4.0] CUDA available: True GPU 0,1,2,3,4,5,6,7: NVIDIA H800 GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.0 CUDA_HOME: /usr/local/cuda-12.4 NVCC: Cuda compilation tools, release 12.4, V12.4.131 CUDA Driver Version: 535.230.02 PyTorch: 2.6.0+cu124 sglang: 0.4.6.post2 sgl_kernel: 0.1.1 flashinfer_python: 0.2.5 triton: 3.2.0 transformers: 4.51.1 torchao: 0.10.0 numpy: 1.26.4 aiohttp: 3.11.18 fastapi: 0.115.12 hf_transfer: 0.1.9 huggingface_hub: 0.31.1 interegular: 0.3.3 modelscope: 1.25.0 orjson: 3.10.18 outlines: 0.1.11 packaging: 25.0 psutil: 7.0.0 pydantic: 2.11.4 python-multipart: 0.0.20 pyzmq: 26.4.0 uvicorn: 0.34.2 uvloop: 0.21.0 vllm: 0.8.2 xgrammar: 0.1.16 openai: 1.75.0 tiktoken: 0.9.0 anthropic: 0.51.0 litellm: 1.68.1 decord: 0.6.0 NVIDIA Topology: GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 NIC0 NIC1 NIC2 NIC3 NIC4 NIC5 NIC6 NIC7 NIC8 CPU Affinity NUMA Affinity GPU NUMA ID GPU0 X NV8 NV8 NV8 NV8 NV8 NV8 NV8 SYS PIX PHB PHB PHB SYS SYS SYS SYS 0-89 0 N/A GPU1 NV8 X NV8 NV8 NV8 NV8 NV8 NV8 SYS PHB PIX PHB PHB SYS SYS SYS SYS 0-89 0 N/A GPU2 NV8 NV8 X NV8 NV8 NV8 NV8 NV8 SYS PHB PHB PIX PHB SYS SYS SYS SYS 0-89 0 N/A GPU3 NV8 NV8 NV8 X NV8 NV8 NV8 NV8 SYS PHB PHB PHB PIX SYS SYS SYS SYS 0-89 0 N/A GPU4 NV8 NV8 NV8 NV8 X NV8 NV8 NV8 SYS SYS SYS SYS SYS PIX PHB PHB PHB 90-179 1 N/A GPU5 NV8 NV8 NV8 NV8 NV8 X NV8 NV8 SYS SYS SYS SYS SYS PHB PIX PHB PHB 90-179 1 N/A GPU6 NV8 NV8 NV8 NV8 NV8 NV8 X NV8 SYS SYS SYS SYS SYS PHB PHB PIX PHB 90-179 1 N/A GPU7 NV8 NV8 NV8 NV8 NV8 NV8 NV8 X SYS SYS SYS SYS SYS PHB PHB PHB PIX 90-179 1 N/A NIC0 SYS SYS SYS SYS SYS SYS SYS SYS X SYS SYS SYS SYS SYS SYS SYS SYS NIC1 PIX PHB PHB PHB SYS SYS SYS SYS SYS X PHB PHB PHB SYS SYS SYS SYS NIC2 PHB PIX PHB PHB SYS SYS SYS SYS SYS PHB X PHB PHB SYS SYS SYS SYS NIC3 PHB PHB PIX PHB SYS SYS SYS SYS SYS PHB PHB X PHB SYS SYS SYS SYS NIC4 PHB PHB PHB PIX SYS SYS SYS SYS SYS PHB PHB PHB X SYS SYS SYS SYS NIC5 SYS SYS SYS SYS PIX PHB PHB PHB SYS SYS SYS SYS SYS X PHB PHB PHB NIC6 SYS SYS SYS SYS PHB PIX PHB PHB SYS SYS SYS SYS SYS PHB X PHB PHB NIC7 SYS SYS SYS SYS PHB PHB PIX PHB SYS SYS SYS SYS SYS PHB PHB X PHB NIC8 SYS SYS SYS SYS PHB PHB PHB PIX SYS SYS SYS SYS SYS PHB PHB PHB X

Legend:

X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) PIX = Connection traversing at most a single PCIe bridge NV# = Connection traversing a bonded set of # NVLinks

NIC Legend:

NIC0: mlx5_0 NIC1: mlx5_1 NIC2: mlx5_2 NIC3: mlx5_3 NIC4: mlx5_4 NIC5: mlx5_5 NIC6: mlx5_6 NIC7: mlx5_7 NIC8: mlx5_8

Hypervisor vendor: KVM ulimit soft: 1048576

CSEEduanyu avatar May 11 '25 07:05 CSEEduanyu

gdrcopy_copybw : GPU id:0; name: NVIDIA H800; Bus id: 0000:63:00 GPU id:1; name: NVIDIA H800; Bus id: 0000:67:00 GPU id:2; name: NVIDIA H800; Bus id: 0000:6b:00 GPU id:3; name: NVIDIA H800; Bus id: 0000:6f:00 GPU id:4; name: NVIDIA H800; Bus id: 0000:a3:00 GPU id:5; name: NVIDIA H800; Bus id: 0000:a7:00 GPU id:6; name: NVIDIA H800; Bus id: 0000:ab:00 GPU id:7; name: NVIDIA H800; Bus id: 0000:af:00 selecting device 0 testing size: 131072 rounded size: 131072 gpu alloc fn: cuMemAlloc device ptr: 7fbfb7e00000 map_d_ptr: 0x7fc1e81e7000 info.va: 7fbfb7e00000 info.mapped_size: 131072 info.page_size: 65536 info.mapped: 1 info.wc_mapping: 1 page offset: 0 user-space pointer:0x7fc1e81e7000 writing test, size=131072 offset=0 num_iters=10000 write BW: 17884.9MB/s reading test, size=131072 offset=0 num_iters=100 read BW: 669.866MB/s unmapping buffer unpinning buffer closing gdrdrv

CSEEduanyu avatar May 11 '25 08:05 CSEEduanyu

I got the same error, what's the version of mooncake of yours?

feng397 avatar May 11 '25 10:05 feng397

same error

I0513 11:04:17.446540  5509 transfer_engine.cpp:350] Metrics reporting is disabled (set MC_TE_METRIC=1 to enable)
I0513 11:04:17.446815  5509 transfer_engine.cpp:44] Transfer Engine starting. Server: 10.148.0.107, Metadata: P2PHANDSHAKE, ip_or_host_name: , rpc_port: 0
I0513 11:04:17.446906  5509 transfer_engine.cpp:100] Transfer Engine RPC using P2P handshake, listening on 10.148.0.107:16845
I0513 11:04:17.447018  5509 transfer_engine.cpp:112] Auto-discovering topology...
I0513 11:04:17.447331  5509 transfer_engine.cpp:127] Topology discovery complete. Found 3 HCAs.
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0513 11:04:17.451316  5511 transfer_engine.cpp:350] Metrics reporting is disabled (set MC_TE_METRIC=1 to enable)
I0513 11:04:17.451431  5511 transfer_engine.cpp:44] Transfer Engine starting. Server: 10.148.0.107, Metadata: P2PHANDSHAKE, ip_or_host_name: , rpc_port: 0
I0513 11:04:17.451501  5511 transfer_engine.cpp:100] Transfer Engine RPC using P2P handshake, listening on 10.148.0.107:16093
I0513 11:04:17.451609  5511 transfer_engine.cpp:112] Auto-discovering topology...
I0513 11:04:17.451884  5511 transfer_engine.cpp:127] Topology discovery complete. Found 3 HCAs.
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0513 11:04:17.452726  5512 transfer_engine.cpp:350] Metrics reporting is disabled (set MC_TE_METRIC=1 to enable)
I0513 11:04:17.452818  5512 transfer_engine.cpp:44] Transfer Engine starting. Server: 10.148.0.107, Metadata: P2PHANDSHAKE, ip_or_host_name: , rpc_port: 0
I0513 11:04:17.452875  5512 transfer_engine.cpp:100] Transfer Engine RPC using P2P handshake, listening on 10.148.0.107:15853
I0513 11:04:17.452991  5512 transfer_engine.cpp:112] Auto-discovering topology...
I0513 11:04:17.453236  5512 transfer_engine.cpp:127] Topology discovery complete. Found 3 HCAs.
I0513 11:04:17.457496  5509 rdma_context.cpp:411] Find best gid index: 0 on mlx5_4"temp"/
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0513 11:04:17.457509  5505 transfer_engine.cpp:350] Metrics reporting is disabled (set MC_TE_METRIC=1 to enable)
I0513 11:04:17.457654  5505 transfer_engine.cpp:44] Transfer Engine starting. Server: 10.148.0.107, Metadata: P2PHANDSHAKE, ip_or_host_name: , rpc_port: 0
I0513 11:04:17.457715  5505 transfer_engine.cpp:100] Transfer Engine RPC using P2P handshake, listening on 10.148.0.107:16766
I0513 11:04:17.457805  5505 transfer_engine.cpp:112] Auto-discovering topology...
I0513 11:04:17.458135  5505 transfer_engine.cpp:127] Topology discovery complete. Found 3 HCAs.
I0513 11:04:17.458601  5509 rdma_context.cpp:125] RDMA device: mlx5_4"temp", LID: 66, GID: (GID_Index 0) fe:80:00:00:00:00:00:00:a0:88:c2:03:00:24:c9:26
I0513 11:04:17.461931  5511 rdma_context.cpp:411] Find best gid index: 0 on mlx5_4"temp"/
I0513 11:04:17.462610  5511 rdma_context.cpp:125] RDMA device: mlx5_4"temp", LID: 66, GID: (GID_Index 0) fe:80:00:00:00:00:00:00:a0:88:c2:03:00:24:c9:26
I0513 11:04:17.462841  5512 rdma_context.cpp:411] Find best gid index: 0 on mlx5_4"temp"/
I0513 11:04:17.463502  5512 rdma_context.cpp:125] RDMA device: mlx5_4"temp", LID: 66, GID: (GID_Index 0) fe:80:00:00:00:00:00:00:a0:88:c2:03:00:24:c9:26
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0513 11:04:17.464078  5508 transfer_engine.cpp:350] Metrics reporting is disabled (set MC_TE_METRIC=1 to enable)
I0513 11:04:17.464174  5508 transfer_engine.cpp:44] Transfer Engine starting. Server: 10.148.0.107, Metadata: P2PHANDSHAKE, ip_or_host_name: , rpc_port: 0
I0513 11:04:17.464231  5508 transfer_engine.cpp:100] Transfer Engine RPC using P2P handshake, listening on 10.148.0.107:15913
I0513 11:04:17.464336  5508 transfer_engine.cpp:112] Auto-discovering topology...
I0513 11:04:17.464612  5508 transfer_engine.cpp:127] Topology discovery complete. Found 3 HCAs.
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0513 11:04:17.467101  5507 transfer_engine.cpp:350] Metrics reporting is disabled (set MC_TE_METRIC=1 to enable)
I0513 11:04:17.467196  5507 transfer_engine.cpp:44] Transfer Engine starting. Server: 10.148.0.107, Metadata: P2PHANDSHAKE, ip_or_host_name: , rpc_port: 0
I0513 11:04:17.467249  5507 transfer_engine.cpp:100] Transfer Engine RPC using P2P handshake, listening on 10.148.0.107:15905
I0513 11:04:17.467348  5507 transfer_engine.cpp:112] Auto-discovering topology...
I0513 11:04:17.467649  5507 transfer_engine.cpp:127] Topology discovery complete. Found 3 HCAs.
I0513 11:04:17.468641  5505 rdma_context.cpp:411] Find best gid index: 0 on mlx5_4"temp"/
I0513 11:04:17.469282  5505 rdma_context.cpp:125] RDMA device: mlx5_4"temp", LID: 66, GID: (GID_Index 0) fe:80:00:00:00:00:00:00:a0:88:c2:03:00:24:c9:26
I0513 11:04:17.469362  5509 rdma_context.cpp:411] Find best gid index: 0 on mlx5_5"temp"/
I0513 11:04:17.470508  5509 rdma_context.cpp:125] RDMA device: mlx5_5"temp", LID: 51, GID: (GID_Index 0) fe:80:00:00:00:00:00:00:a0:88:c2:03:00:2a:14:aa
I0513 11:04:17.471184  5511 rdma_context.cpp:411] Find best gid index: 0 on mlx5_5"temp"/
I0513 11:04:17.471905  5511 rdma_context.cpp:125] RDMA device: mlx5_5"temp", LID: 51, GID: (GID_Index 0) fe:80:00:00:00:00:00:00:a0:88:c2:03:00:2a:14:aa
I0513 11:04:17.473951  5508 rdma_context.cpp:411] Find best gid index: 0 on mlx5_4"temp"/
I0513 11:04:17.474507  5512 rdma_context.cpp:411] Find best gid index: 0 on mlx5_5"temp"/
I0513 11:04:17.474612  5508 rdma_context.cpp:125] RDMA device: mlx5_4"temp", LID: 66, GID: (GID_Index 0) fe:80:00:00:00:00:00:00:a0:88:c2:03:00:24:c9:26
I0513 11:04:17.475162  5512 rdma_context.cpp:125] RDMA device: mlx5_5"temp", LID: 51, GID: (GID_Index 0) fe:80:00:00:00:00:00:00:a0:88:c2:03:00:2a:14:aa
I0513 11:04:17.477639  5507 rdma_context.cpp:411] Find best gid index: 0 on mlx5_4"temp"/
I0513 11:04:17.478277  5507 rdma_context.cpp:125] RDMA device: mlx5_4"temp", LID: 66, GID: (GID_Index 0) fe:80:00:00:00:00:00:00:a0:88:c2:03:00:24:c9:26
I0513 11:04:17.479204  5505 rdma_context.cpp:411] Find best gid index: 0 on mlx5_5"temp"/
I0513 11:04:17.479861  5505 rdma_context.cpp:125] RDMA device: mlx5_5"temp", LID: 51, GID: (GID_Index 0) fe:80:00:00:00:00:00:00:a0:88:c2:03:00:2a:14:aa
I0513 11:04:17.480155  5509 rdma_context.cpp:411] Find best gid index: 0 on mlx5_3/
I0513 11:04:17.480209  5511 rdma_context.cpp:411] Find best gid index: 0 on mlx5_3/
I0513 11:04:17.481312  5509 rdma_context.cpp:125] RDMA device: mlx5_3, LID: 31, GID: (GID_Index 0) fe:80:00:00:00:00:00:00:a0:88:c2:03:00:2a:13:92
I0513 11:04:17.481333  5511 rdma_context.cpp:125] RDMA device: mlx5_3, LID: 31, GID: (GID_Index 0) fe:80:00:00:00:00:00:00:a0:88:c2:03:00:2a:13:92
I0513 11:04:17.484032  5508 rdma_context.cpp:411] Find best gid index: 0 on mlx5_5"temp"/
I0513 11:04:17.484501  5512 rdma_context.cpp:411] Find best gid index: 0 on mlx5_3/
I0513 11:04:17.484687  5508 rdma_context.cpp:125] RDMA device: mlx5_5"temp", LID: 51, GID: (GID_Index 0) fe:80:00:00:00:00:00:00:a0:88:c2:03:00:2a:14:aa
I0513 11:04:17.485165  5512 rdma_context.cpp:125] RDMA device: mlx5_3, LID: 31, GID: (GID_Index 0) fe:80:00:00:00:00:00:00:a0:88:c2:03:00:2a:13:92
I0513 11:04:17.488559  5507 rdma_context.cpp:411] Find best gid index: 0 on mlx5_5"temp"/
I0513 11:04:17.489305  5507 rdma_context.cpp:125] RDMA device: mlx5_5"temp", LID: 51, GID: (GID_Index 0) fe:80:00:00:00:00:00:00:a0:88:c2:03:00:2a:14:aa
I0513 11:04:17.489553  5505 rdma_context.cpp:411] Find best gid index: 0 on mlx5_3/
I0513 11:04:17.490327  5505 rdma_context.cpp:125] RDMA device: mlx5_3, LID: 31, GID: (GID_Index 0) fe:80:00:00:00:00:00:00:a0:88:c2:03:00:2a:13:92
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0513 11:04:17.491269  5506 transfer_engine.cpp:350] Metrics reporting is disabled (set MC_TE_METRIC=1 to enable)
I0513 11:04:17.491389  5506 transfer_engine.cpp:44] Transfer Engine starting. Server: 10.148.0.107, Metadata: P2PHANDSHAKE, ip_or_host_name: , rpc_port: 0
I0513 11:04:17.491463  5506 transfer_engine.cpp:100] Transfer Engine RPC using P2P handshake, listening on 10.148.0.107:16507
I0513 11:04:17.491551  5506 transfer_engine.cpp:112] Auto-discovering topology...
I0513 11:04:17.491820  5506 transfer_engine.cpp:127] Topology discovery complete. Found 3 HCAs.
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0513 11:04:17.493638  5510 transfer_engine.cpp:350] Metrics reporting is disabled (set MC_TE_METRIC=1 to enable)
I0513 11:04:17.493726  5510 transfer_engine.cpp:44] Transfer Engine starting. Server: 10.148.0.107, Metadata: P2PHANDSHAKE, ip_or_host_name: , rpc_port: 0
I0513 11:04:17.493782  5510 transfer_engine.cpp:100] Transfer Engine RPC using P2P handshake, listening on 10.148.0.107:16320
I0513 11:04:17.493878  5510 transfer_engine.cpp:112] Auto-discovering topology...
I0513 11:04:17.494138  5510 transfer_engine.cpp:127] Topology discovery complete. Found 3 HCAs.
I0513 11:04:17.496711  5508 rdma_context.cpp:411] Find best gid index: 0 on mlx5_3/
I0513 11:04:17.497254  5507 rdma_context.cpp:411] Find best gid index: 0 on mlx5_3/
I0513 11:04:17.497359  5508 rdma_context.cpp:125] RDMA device: mlx5_3, LID: 31, GID: (GID_Index 0) fe:80:00:00:00:00:00:00:a0:88:c2:03:00:2a:13:92
I0513 11:04:17.498096  5507 rdma_context.cpp:125] RDMA device: mlx5_3, LID: 31, GID: (GID_Index 0) fe:80:00:00:00:00:00:00:a0:88:c2:03:00:2a:13:92
I0513 11:04:17.502910  5506 rdma_context.cpp:411] Find best gid index: 0 on mlx5_4"temp"/
I0513 11:04:17.503547  5510 rdma_context.cpp:411] Find best gid index: 0 on mlx5_4"temp"/
I0513 11:04:17.503665  5506 rdma_context.cpp:125] RDMA device: mlx5_4"temp", LID: 66, GID: (GID_Index 0) fe:80:00:00:00:00:00:00:a0:88:c2:03:00:24:c9:26
I0513 11:04:17.504220  5510 rdma_context.cpp:125] RDMA device: mlx5_4"temp", LID: 66, GID: (GID_Index 0) fe:80:00:00:00:00:00:00:a0:88:c2:03:00:24:c9:26
I0513 11:04:17.512956  5506 rdma_context.cpp:411] Find best gid index: 0 on mlx5_5"temp"/
I0513 11:04:17.513473  5510 rdma_context.cpp:411] Find best gid index: 0 on mlx5_5"temp"/
I0513 11:04:17.513661  5506 rdma_context.cpp:125] RDMA device: mlx5_5"temp", LID: 51, GID: (GID_Index 0) fe:80:00:00:00:00:00:00:a0:88:c2:03:00:2a:14:aa
I0513 11:04:17.514109  5510 rdma_context.cpp:125] RDMA device: mlx5_5"temp", LID: 51, GID: (GID_Index 0) fe:80:00:00:00:00:00:00:a0:88:c2:03:00:2a:14:aa
I0513 11:04:17.522951  5506 rdma_context.cpp:411] Find best gid index: 0 on mlx5_3/
I0513 11:04:17.523634  5506 rdma_context.cpp:125] RDMA device: mlx5_3, LID: 31, GID: (GID_Index 0) fe:80:00:00:00:00:00:00:a0:88:c2:03:00:2a:13:92
I0513 11:04:17.524181  5510 rdma_context.cpp:411] Find best gid index: 0 on mlx5_3/
I0513 11:04:17.524910  5510 rdma_context.cpp:125] RDMA device: mlx5_3, LID: 31, GID: (GID_Index 0) fe:80:00:00:00:00:00:00:a0:88:c2:03:00:2a:13:92
E0513 11:04:18.122572  5509 rdma_context.cpp:198] Failed to register memory 0x7f7a4e000000: Bad address [14]
[2025-05-13 11:04:18 TP4] Mooncake memory registration failed.
[2025-05-13 11:04:18 TP4] Scheduler hit an exception: Traceback (most recent call last):
  File "/opt/app/python3.12/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 2372, in run_scheduler_process
    scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, pp_rank, dp_rank)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app/python3.12/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 463, in __init__
    self.init_disaggregation()
  File "/opt/app/python3.12/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 620, in init_disaggregation
    self.disagg_prefill_bootstrap_queue = PrefillBootstrapQueue(
                                          ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app/python3.12/lib/python3.12/site-packages/sglang/srt/disaggregation/prefill.py", line 82, in __init__
    self.kv_manager = self._init_kv_manager()
                      ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app/python3.12/lib/python3.12/site-packages/sglang/srt/disaggregation/prefill.py", line 116, in _init_kv_manager
    kv_manager = kv_manager_class(
                 ^^^^^^^^^^^^^^^^^
  File "/opt/app/python3.12/lib/python3.12/site-packages/sglang/srt/disaggregation/mooncake/conn.py", line 146, in __init__
    self.register_buffer_to_engine()
  File "/opt/app/python3.12/lib/python3.12/site-packages/sglang/srt/disaggregation/mooncake/conn.py", line 173, in register_buffer_to_engine
    self.engine.register(kv_data_ptr, kv_data_len)
  File "/opt/app/python3.12/lib/python3.12/site-packages/sglang/srt/disaggregation/mooncake/transfer_engine.py", line 36, in register
    raise RuntimeError("Mooncake memory registration failed.")
RuntimeError: Mooncake memory registration failed.

YosanHo avatar May 13 '25 03:05 YosanHo

Closed as resolved in https://github.com/kvcache-ai/Mooncake/issues/351.

ShangmingCai avatar May 14 '25 07:05 ShangmingCai

I got the same error, what's the version of mooncake of yours?

Have you completely resolved this issue? @feng397

CSEEduanyu avatar May 15 '25 13:05 CSEEduanyu

I got the same error, what's the version of mooncake of yours?

Have you completely resolved this issue? @feng397

we also met this problem, no idea about the root cause

mingxiao666 avatar May 21 '25 12:05 mingxiao666

I got the same error, what's the version of mooncake of yours?

Have you completely resolved this issue? @feng397

we also met this problem, no idea about the root cause

https://github.com/pytorch/pytorch/issues/153688#issuecomment-2891704714 You can refer to it @mingxiao666

CSEEduanyu avatar May 21 '25 14:05 CSEEduanyu