schung-amd
schung-amd
Hi @BrendanCunningham, while the driver is not yet public, is it possible to provide some code with your caching logic that reproduces the issue? I've reached out to our kernel...
Thanks for the quick response! > Yes. I misspoke before; our AMD DMA code is not ready but it is public. [hfi1/pin_amd.c](https://github.com/cornelisnetworks/opa-distro-drivers/blob/rhel9.3/drivers/infiniband/hw/hfi1/pin_amd.c) has our send-side AMD DMA support, i.e. construct...
Hi @BrendanCunningham, thanks for following up on this! Sorry for the delay, I'm trying to collect more information from our internal teams before providing answers because I don't have a...
@BrendanCunningham Still gathering information re: calling rdma_put_pages() from an interrupt context; the internal team initially recommends against calling it, but is digging into the code to check. > Correct; that...
According to the internal team, your driver should have full control over the lifetime of the buffers; amdgpu guarantees that your buffers are resident until the driver calls rdma_put_pages() on...
Thanks for the logs! I'll pass them on to the internal team for more insight. As discussed, I wouldn't expect the callback to be called anywhere, as the internal team...
Thanks for the clarification, I discussed this with the internal team. Unfortunately, we do not provide support for this. From our point of view, `hfi1` needs to have full control...
Sorry for the delay, I've gotten some answers from the internal team. > Are all ROCm-VA:GPU-page mappings mapped into the process virtual address space? If so, can we monitor those...
> We have modified our driver, hfi1 to monitor ROCm VA ranges for UNMAP with mmu_interval_notifier and remove those entries from our cache. > > Prior to this change, our...
To clarify, in your reproducer (and I assume in the intended usecase), the memory that the VAs point to is device memory allocated with hipMalloc and freed with hipFree?