JetsonUnifiedMemory icon indicating copy to clipboard operation
JetsonUnifiedMemory copied to clipboard

How to avoid memory copy between Host and Device?

Open caoxuefengzz opened this issue 2 years ago • 0 comments

Hello annikabrundyn ! I used your code "unified memory" to work with yolov3 tensorrt inference. I run my code on a Jetson AGX. Tensorrt version:7.1.3 and cuda version:10.2 . But I found the memory copy between Host and Device seems nothing changed.
I changed the code as below: for binding in engine: size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size dtype = trt.nptype(engine.get_binding_dtype(binding)) # Allocate host and device buffers host_mem = cuda.pagelocked_empty(size, dtype) device_mem = cuda.mem_alloc(host_mem.nbytes) # Append the device buffer to device bindings. bindings.append(int(device_mem)) # Append to the appropriate list. if engine.binding_is_input(binding): inputs.append(HostDeviceMem(host_mem, device_mem)) else: outputs.append(HostDeviceMem(host_mem, device_mem))

--------------------------changed to---------------------------------------- ` for binding in engine: size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size dtype = trt.nptype(engine.get_binding_dtype(binding))

    # Allocate single managed mem buffer
    mem = cuda.managed_empty(size, dtype, mem_flags=cuda.mem_attach_flags.GLOBAL)

    # Append the device buffer to device bindings.
    bindings.append(int(mem.base.get_device_pointer()))

    # Append to the appropriate list.
    if engine.binding_is_input(binding):
        inputs.append(mem)
    else:
        outputs.append(mem)`

I found the changes seems useless,all happens the same!
the memory usage of jetson AGX is same, Compared with before modifying the code.

further more, I found when [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs] and [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs], the memory usage always the same. It seems that nothing copied between Host and Device; or "cuda.memcpy_htod_async" just is a pointer copy ?

Can you give me more information about how to avoid memory copy between Host and Device ? Wish your reply!!

caoxuefengzz avatar Jul 08 '22 06:07 caoxuefengzz