client
client copied to clipboard
DLPack tensor is not contiguous. Only contiguous DLPack tensors that are stored in C-Order are supported.
Description test_cuda_shared_memory.py failed when dimension of batch size is smaller than 2. I think this issue is from pytorch but I'm just wondering if there's any workaround to make it work with pytorch version >1.12.
Workaround To use torch 1.12 version. I tested and it's working fine
Triton Information What version of Triton client are you using? compiled from latest version b0b5b27c590a1c9f47e01f760e7b061c16d92af1
To Reproduce
class DLPackTest(unittest.TestCase):
"""
Testing DLPack implementation in CUDA shared memory utilities
"""
def test_from_gpu(self):
# Create GPU tensor via PyTorch and CUDA shared memory region with
# enough space
tensor_shape = (1,2,4)
gpu_tensor = torch.ones(tensor_shape).cuda(0)
byte_size = gpu_tensor.nelement() * gpu_tensor.element_size()
shm_handle = cudashm.create_shared_memory_region("cudashm_data", byte_size, 0)
# Set data from DLPack specification of PyTorch tensor
cudashm.set_shared_memory_region_from_dlpack(shm_handle, [gpu_tensor])
# Make sure the DLPack specification of the shared memory region can
# be consumed by PyTorch
smt = cudashm.as_shared_memory_tensor(shm_handle, "FP32", gpu_tensor)
generated_torch_tensor = torch.from_dlpack(smt)
self.assertTrue(torch.allclose(gpu_tensor, generated_torch_tensor))
cudashm.destroy_shared_memory_region(shm_handle)