client icon indicating copy to clipboard operation
client copied to clipboard

DLPack tensor is not contiguous. Only contiguous DLPack tensors that are stored in C-Order are supported.

Open kennyvoo opened this issue 3 months ago • 0 comments

Description test_cuda_shared_memory.py failed when dimension of batch size is smaller than 2. I think this issue is from pytorch but I'm just wondering if there's any workaround to make it work with pytorch version >1.12.

Workaround To use torch 1.12 version. I tested and it's working fine

Triton Information What version of Triton client are you using? compiled from latest version b0b5b27c590a1c9f47e01f760e7b061c16d92af1

To Reproduce

class DLPackTest(unittest.TestCase):
    """
    Testing DLPack implementation in CUDA shared memory utilities
    """

    def test_from_gpu(self):
        # Create GPU tensor via PyTorch and CUDA shared memory region with
        # enough space
        tensor_shape = (1,2,4)
        gpu_tensor = torch.ones(tensor_shape).cuda(0)
        byte_size = gpu_tensor.nelement() * gpu_tensor.element_size()

        shm_handle = cudashm.create_shared_memory_region("cudashm_data", byte_size, 0)

        # Set data from DLPack specification of PyTorch tensor
        cudashm.set_shared_memory_region_from_dlpack(shm_handle, [gpu_tensor])

        # Make sure the DLPack specification of the shared memory region can
        # be consumed by PyTorch
        smt = cudashm.as_shared_memory_tensor(shm_handle, "FP32", gpu_tensor)
        generated_torch_tensor = torch.from_dlpack(smt)
        self.assertTrue(torch.allclose(gpu_tensor, generated_torch_tensor))

        cudashm.destroy_shared_memory_region(shm_handle)

kennyvoo avatar Mar 19 '24 02:03 kennyvoo