tvm icon indicating copy to clipboard operation
tvm copied to clipboard

[Bug] NDArray::IsAligned Can't Work for OpenCL, Vulkan Device API, should it is removed?

Open Johnson9009 opened this issue 1 year ago • 2 comments

When using OpenCL + Graph Executor + set_input_zero_copy, we found below code will raise exception, actually we know the NDArray is allocated by the OpenCL device API, it should satisfy the alignment requirement. image

After browsing some code of OpenCL device API, we found that OpenCL device API finally will return a pointer of host structure BufferDescriptor, and this pointer is convert to void*, and store in the data field of DLTensor. image This is the reason why the 1st figure alignment check failed, the vulkan device API use the same mechanism, so it have the same problem.

Why Relay VM haven't meet this failed check? image image Acturally Relay VM is impacted by this issue too, the different is Relay VM won't raise exception, but do a non-need copy.

Even through we will pass the aligement argument to the interface virtual void* AllocDataSpace(Device dev, size_t nbytes, size_t alignment, DLDataType type_hint) = 0; of class DeviceAPI, but I found except CPU, it seems all of other device can't avoid to specify aligment when alloc data through their runtime API. So why we need this alignment? can we deleted these checks?

Johnson9009 avatar Mar 20 '24 07:03 Johnson9009

@jwfromm @tqchen @junrushao It seems previous discussion in https://github.com/apache/tvm/pull/12564 is relevant.

Johnson9009 avatar Mar 20 '24 07:03 Johnson9009

I think main issue is that the user might slice the memory in zero copy mode, but we could update IsAligned to ensure that we always return true for Tensors in OpenCL VK backend assuming the byte_offset is 0

tqchen avatar Apr 02 '24 15:04 tqchen