thqq479 comments

Results 14 comments of


                                            thqq479

test_low_latency.py Failed on 2 node H800

> I met the same problem. This issue is usually caused by insufficient GPU memory. Try to reserve more GPU memory for DeepEP buffer how to set this？

[RoadMap] Mooncake Transfer Engine NEXT

In this version, the H100 machine does not have RDMA. Is it feasible to use NVLINK for transmission between nodes within a single machine? @alogfans

[Bug]: p2p nvlink usage

The same issue, have you solved it yet? @ChuanhongLi

[Bug]: p2p nvlink usage

On the H100 machine, I tried the latest tent version, but I still got the same OOM error @alogfans @ShangmingCai Hi,Could you please assist us?

[Bug]: p2p nvlink usage

When SGLANG_MOONCAKE_CUSTOM_MEM_POOL is set to true, the GPU memory becomes abnormal/ malfunctioning

> > On the H100 machine, I tried the latest tent version, but I still got the same OOM error [@alogfans](https://github.com/alogfans) [@ShangmingCai](https://github.com/ShangmingCai) Hi,Could you please assist us? > > [@thqq479](https://github.com/thqq479)...

Can NVLink be used when Prefill and Decode are deployed separately on the same machine?

> Enable `USE_CUDA` and `USE_MNNVL` in https://github.com/kvcache-ai/Mooncake/blob/897728ddbfb8c1269c3cb64b4097e281d203faff/mooncake-common/common.cmake Then compile and install from source. > > Set this up with sglang: > > ``` > export SGLANG_MOONCAKE_CUSTOM_MEM_POOL=True > export MC_FORCE_MNNVL=True >...

Can NVLink be used when Prefill and Decode are deployed separately on the same machine?

> [@thqq479](https://github.com/thqq479) It requires CUDA 12.8+. What is your CUDA version? nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2025 NVIDIA Corporation Built on Wed_Jan_15_19:20:09_PST_2025 Cuda compilation tools,...

Can NVLink be used when Prefill and Decode are deployed separately on the same machine?

@alogfans @ShangmingCai Hi, may I ask how I should proceed to further investigate this issue, or what environment you tested in?

Can NVLink be used when Prefill and Decode are deployed separately on the same machine?

> > [@alogfans](https://github.com/alogfans) [@ShangmingCai](https://github.com/ShangmingCai) Hi, may I ask how I should proceed to further investigate this issue, or what environment you tested in? > > GB200 The GB200 model is...