warp-ctc cuda out of memory

cuda out of memory

Open MengXinChengXuYuan opened this issue 5 years ago • 0 comments

Hi I met the cuda out of memory using ctc as loss pytorch=0.4.1 python=3.6, and I got 8 gtx titanx

Like mentioned in #118 and just the same as #76 by mankeyboy

My input of ctc is like 4, 128, 10, as 128 is the batch size, and 128 labels for every 4 sequence of 10 alphabets. The label_sizes and probs_sizes is defined as:

label_sizes = torch.IntTensor([1])
probs_sizes = torch.IntTensor([4])

Even though I reduce the batch size to 4 the error still occurs As mentioned in #118 I know the label_sizes or the probs_sizes maybe incorrect as I don`t fully understand what is described in the readme

So I tried the code below:

        probs = np.zeros((4, 4, 10)).astype(np.float32)
        probs = torch.from_numpy(probs).contiguous()
        labels = np.zeros((4)).astype(np.int32)
        labels = torch.from_numpy(labels)
        label_sizes = torch.IntTensor([1])
        probs_sizes = torch.IntTensor([4])
        probs.requires_grad_(True)  # tells autograd to compute gradients for probs
        cost = ctc_loss(probs, labels, probs_sizes, label_sizes)

No memory error but:

    result = torch._C._safe_call(*args, **kwargs)
    torch.FatalError: std::bad_alloc

However the provided example in readme work fine for me.... In order to make sure I`m understanding the params correctly, I modified the sample code a little like:

        probs = torch.FloatTensor([[[0.1, 0.6, 0.1, 0.1, 0.1], [0.1, 0.1, 0.6, 0.1, 0.1], [0.1, 0.1, 0.6, 0.1, 0.1]], [[0.1, 0.6, 0.1, 0.1, 0.1], [0.1, 0.1, 0.6, 0.1, 0.1], [0.1, 0.1, 0.6, 0.1, 0.1]]]).transpose(0, 1).contiguous()
        labels = torch.IntTensor([1,3])
        label_sizes = torch.IntTensor([1])
        probs_sizes = torch.IntTensor([3])
        probs.requires_grad_(True)  # tells autograd to compute gradients for probs
        cost = ctc_loss(probs, labels, probs_sizes, label_sizes)
        cost.backward()

It also works fine! What could be wrong ? Am I misunderstanding the meaning of label_sizes and probs_sizes ?

Here is the simplified output log:

result = torch._C._safe_call(*args, **kwargs)
torch.FatalError: CUDA error: out of memory (allocate at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/THCCachingAllocator.cpp:510)
frame #0: THCudaMalloc + 0x79 (0x7fd51031ad79 in /home/liusiyao/miniconda3/envs/py3.6_torch041/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #1: gpu_ctc + 0x12e (0x7fd58633de3e in /home/liusiyao/miniconda3/envs/py3.6_torch041/lib/python3.6/site-packages/warpctc_pytorch-0.1-py3.6-linux-x86_64.egg/warpctc_pytorch/_warp_ctc/__warp_ctc.cpython-36m-x86_64-linux-gnu.so)
frame #2: <unknown function> + 0x2672 (0x7fd58633d672 in /home/liusiyao/miniconda3/envs/py3.6_torch041/lib/python3.6/site-packages/warpctc_pytorch-0.1-py3.6-linux-x86_64.egg/warpctc_pytorch/_warp_ctc/__warp_ctc.cpython-36m-x86_64-linux-gnu.so)
......
frame #46: __libc_start_main + 0xf0 (0x7fd589acf830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #47: <unknown function> + 0x1ccd7f (0x564998c84d7f in /home/liusiyao/miniconda3/envs/py3.6_torch041/bin/python)

Jun 04 '19 09:06 MengXinChengXuYuan

warp-ctc warp-ctc copied to clipboard

cuda out of memory

warp-ctc
warp-ctc copied to clipboard