warp-ctc icon indicating copy to clipboard operation
warp-ctc copied to clipboard

gpu_ctc calculates 0 loss

Open ckonniefan opened this issue 5 years ago • 6 comments

Test_gpu and Test_cpu are all passed. But when I use gpu_ctc to calculate cost the output is 0, while cpu_ctc output 2.46. My pytorch version is 1.0, python 2.7, cuda 10.0.

  1. put probs on cuda
import warpctc_pytorch as warpctc
from warpctc_pytorch import CTCLoss

import torch

ctc_loss = CTCLoss()
# expected shape of seqLength x batchSize x alphabet_size
probs = torch.FloatTensor([[[0.1, 0.6, 0.1, 0.1, 0.1], [0.1, 0.1, 0.6, 0.1, 0.1]]]).transpose(0, 1).contiguous().cuda()
labels = torch.IntTensor([1, 2])
label_sizes = torch.IntTensor([2])
probs_sizes = torch.IntTensor([2])
probs.requires_grad_(True)  # tells autograd to compute gradients for probs
cost = ctc_loss(probs, labels, probs_sizes, label_sizes)
output : cost = tensor([0.])
  1. probs on cpu
import warpctc_pytorch as warpctc
from warpctc_pytorch import CTCLoss

import torch

ctc_loss = CTCLoss()
# expected shape of seqLength x batchSize x alphabet_size
probs = torch.FloatTensor([[[0.1, 0.6, 0.1, 0.1, 0.1], [0.1, 0.1, 0.6, 0.1, 0.1]]]).transpose(0, 1).contiguous()
labels = torch.IntTensor([1, 2])
label_sizes = torch.IntTensor([2])
probs_sizes = torch.IntTensor([2])
probs.requires_grad_(True)  # tells autograd to compute gradients for probs
cost = ctc_loss(probs, labels, probs_sizes, label_sizes)
output: cost = tensor([2.4629])

ckonniefan avatar Dec 19 '18 10:12 ckonniefan

@ ckonniefan I have seen same problem. Have you solved ?

wuliebucha avatar Jan 15 '19 06:01 wuliebucha

@ckonniefan @wuliebucha has anyone fixed it?

eastonYi avatar Mar 17 '19 03:03 eastonYi

I am having the same problem. However, in my case, it happens randomly :( @ckonniefan @wuliebucha did you solve this problem?

khassanoff avatar May 10 '19 00:05 khassanoff

I'm having the same issue. Has anyone been able to look into this? @SeanNaren @ckonniefan @wuliebucha

My environment is:

  • Ubuntu 16.04
  • Python 3.6.5 :: Anaconda, Inc.
  • Pytorch 1.3.1
  • I used the workout around at the bottom of this thread to complete my setup: https://github.com/SeanNaren/warp-ctc/issues/30

The output from tests/test_gpu.py is below: libs/warp-ctc/pytorch_binding/tests/test_gpu.py CPU_cost: 2.462858 GPU_cost: 0.000000 tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]], device='cuda:0') .CPU_cost: 6.016518 GPU_cost: 0.000000 tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]], device='cuda:0') .CPU_cost: 199.966187 GPU_cost: 0.000000 tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]], device='cuda:0') .CPU_cost: 6.416517 GPU_cost: 0.000000 tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]], device='cuda:0')

dzubke avatar Jan 18 '20 00:01 dzubke

I found that I didn't encounter this error when using a Nvidia tesla k80 or P100 GPU. The test_gpu.py runs as expected with the CPU and GPU costs being nearly identical and the warp-ctc pytorch bindings work within the model I'm using the bindings in.

However, I get this error when using the GPU's below:

  • Nvidia V100
  • Nvidia P4

I have no idea why it is different for different GPU's. My setup is described in my previous comment.

dzubke avatar Jan 22 '20 19:01 dzubke

i also encountered this problem, using gpu v100. a little trick was used to avoid it, that is using gpu to train model except to calculate model loss, before calculating model loss by gpu_ctc, i converted the probs to cpu.

lqyiii avatar Apr 21 '20 09:04 lqyiii