Pak Markthub

Results 58 comments of Pak Markthub

@tylerjereddy Thank you for the additional info. We also call `gdr_unpin_buffer` inside `gdr_close`. But I don't expect to see segfault in `LIST_REMOVE` if it comes from there. A few requests:...

There are multiple things that went wrong here. Let's start with the raw output from my instrumented code without your patch. 1. The output from the instrumented code is in...

IIUC, NVSHMEM does not use GDRCopy directly in that environment. I don't know the libfabric programming model. Is it thread safe? Does it require special handling from the libfabric caller...

Looking at the log you posted in the libfabric issue 10041, you have ``` [112670, 112670] cuda_gdrcopy_dev_unregister() checkpoint 2 after spin lock and before unmap gdrcopy->mh=(nil) ===> [nid001252, 112670, 112670]...

@zigzagcai Sorry, I missed your question. GDRCopy comes with its own driver. You need to be root to install this driver. You can use this gdrdrv container image hosted in...

Hi @wanglecheng123, Do you plan to use the gdrdrv container image? In that case, I suggest that you use it as part of the GPU Operator. That will do the...

Thank you, @billysuh7. conda is not in our support list. I will discuss with the team about your request and our future direction. By the way, how do you manage...

Hi @Notherthing , As you have already figured out, the limitation is your GPU BAR size. This is the GPU HW characteristic. There is nothing much we can do here....

I am not sure what that script does. Because A4000 is not in the support list, I would not advise you to try it. Generally, small BAR GPUs remain as...

Hi @osayamenja, Did you have `NVCCFLAGS` or `CPPFLAGS` set to something related to `nvcc`. This issue means that the CUDA kernel is launched incorrectly. In most cases, it is from...