ch-tiger1
ch-tiger1
I have the same error, the error is as follows ``` /sgl-workspace/nvshmem/src/modules/transport/common/transport_gdr_common.cpp 73 GDR driver version: (2, 4) /sgl-workspace/nvshmem/src/modules/transport/ibgda/ibgda.cpp:nvshmemt_init:3626: neither nv_peer_mem, or nvidia_peermem detected. Skipping transport. /sgl-workspace/nvshmem/src/host/transport/transport.cpp:nvshmemi_transport_init:275: init failed for...
> I've found a "good old version" that works with "IBGDA disabled" machines, which is [a84a248](https://github.com/deepseek-ai/DeepEP/commit/a84a24808fb0ea732f49b874cc456a69dde69076) Thank you. I just looked at the modified patch and found that it is...
> I have the same error, the error is as follows > > ``` > /sgl-workspace/nvshmem/src/modules/transport/common/transport_gdr_common.cpp 73 GDR driver version: (2, 4) > /sgl-workspace/nvshmem/src/modules/transport/ibgda/ibgda.cpp:nvshmemt_init:3626: neither nv_peer_mem, or nvidia_peermem detected. Skipping...
> [@koanho](https://github.com/koanho) Have you modified drvier config? https://github.com/deepseek-ai/DeepEP/tree/main/third-party#4-configure-nvidia-driver @sphish I used the above method to configure the NVIDIA driver, but still got Segmentation fault. Is there any other solution?
> Maybe you should increase this value `num_max_dispatch_tokens_per_rank` Thanks. Is there any standard I can refer to? What is the appropriate setting? I see that the default value of this...
> `num_max_dispatch_tokens_per_rank ` means the maximum tokens to send in a single batch (must be consistent across all ranks). > > So the "appropriate setting" would be the batch size...