gdrcopy
gdrcopy copied to clipboard
cudaMalloc can no longer guarantee to return 64kB aligned address
GDRDRV needs 64kB aligned addresses.
gdrdrv_pin_buffer() {
...
page_virt_start = params.addr & GPU_PAGE_MASK;
page_virt_end = params.addr + params.size - 1;
rounded_size = page_virt_end - page_virt_start + 1;
mr->offset = params.addr & GPU_PAGE_OFFSET;
...
}
and
gdrdrv_mmap() {
...
if (mr->offset) {
gdr_dbg("offset != 0 is not supported\n");
ret = -EINVAL;
goto out;
}
...
}
This is no more guaranteed with the cudaMalloc in recent CUDA drivers (since 410). A temporary WAR could be (at application level) to allocate with the cudaMalloc a memory area that is size + GPU_PAGE_SIZE
and then search for the first 64kB aligned address. Something like:
alloc_size = (buffer_size + GPU_PAGE_SIZE) & GPU_PAGE_MASK;
cuMemAlloc(&dev_addr, alloc_size);
if(dev_addr % GPU_PAGE_SIZE) {
dev_addr += (GPU_PAGE_SIZE - (dev_addr % GPU_PAGE_SIZE));
}
I also encounter this bug.
@drossetti Can we remove the offset checking in gdrdrv_mmap()? To help users, we can also add a flag to gdr_map() such that gdr_map() automatically applies the offset (if any) before returning ptr_va.
@e-ago what if the user of gdrcopy were instead in charge of aligning the start address to the proper page boundary, and to properly handle the offset ?
I don't think we are ready to attack this problem, so removing 2.1 tag