mmlatch icon indicating copy to clipboard operation
mmlatch copied to clipboard

Can't run on GPU

Open gaopeng990618 opened this issue 1 year ago • 4 comments

Hi thank you so much for sharing your work. I am trying to recreate the results. I am using the CMU-MOSEi dataset you provided . The code has an error on GPU, but no problem on CPU. So I find out in rnn.py has some issue so I fix out to make sure x and lengths are in the same device:

class PackSequence(nn.Module):
    def __init__(self, batch_first=True, device="cpu",):
        super(PackSequence, self).__init__()
        self.batch_first = batch_first

        self.device = device

    def forward(self, x, lengths):
        x = pack_padded_sequence(
            x, lengths, batch_first=self.batch_first, enforce_sorted=False
        )
        lengths = lengths.to(self.device)
        lengths = lengths[x.sorted_indices]

        return x, lengths

But I get NaN out from Train Loss . Could you please suggest why this happened.

return loss_value:4.928342342376709
Epoch [1/100]: [1/511]   0%|                         , Train Loss=4.93 [00:00<?]
return loss_value:nan
Epoch [1/100]: [2/511]   0%|                      , Train Loss=nan [00:00<00:54]
return loss_value:nan
Epoch [1/100]: [2/511]   0%|                      , Train Loss=nan [00:00<00:54]
return loss_value:nan
Epoch [1/100]: [4/511]   1%|▏                     , Train Loss=nan [00:00<00:42]
return loss_value:nan
Epoch [1/100]: [4/511]   1%|▏                     , Train Loss=nan [00:00<00:42]
return loss_value:nan
Epoch [1/100]: [6/511]   1%|▎                     , Train Loss=nan [00:00<00:38]
return loss_value:nan
Epoch [1/100]: [6/511]   1%|▎                     , Train Loss=nan [00:00<00:38]

gaopeng990618 avatar Jul 31 '23 07:07 gaopeng990618