cmc-csci181 icon indicating copy to clipboard operation
cmc-csci181 copied to clipboard

RuntimeError: shape '[1]' is invalid for input of size 128

Open clayton-north opened this issue 5 years ago • 1 comments

This is what my code looks like during training

 # training step

        # update to have two outputs
        output_class,output_nextchars = model(line_tensor)
        #  predicted class vs actual class
        print('out_class.shape=', output_class.shape)
        print('category_tensor.shape=', category_tensor.shape)
        loss_class = criterion(output_class, category_tensor)

        loss_nextchars_perchar = torch.zeros(output_nextchars.shape[0])
        for i in range(output_nextchars.shape[0] - 1):
            print('out_nextchars.shape=', output_nextchars.shape)
            print('line_tensor.shape=', line_tensor.shape)
            print('out_nextchars[i,:].shape=', output_nextchars[i,:].shape)
            print('line_tensor[i+1,:].shape=', line_tensor[i+1,:].shape)
            _, nextchar_i = line_tensor[i+1,:].topk(1)
            print('nextchar_i.shape=', nextchar_i.shape)
            nextchar_i= nextchar_i.view([1])
            loss_nextchars_perchar[i] = criterion(output_nextchars[i,:], nextchar_i)

        loss_nextchars = torch.mean(loss_nextchars_perchar)

        loss = loss_class + loss_nextchars
        loss.backward()
        grad_norm = sum([ torch.norm(p.grad)**2 for p in model.parameters() if p.grad is not None])**(1/2)
        if args.gradient_clipping:
            torch.nn.utils.clip_grad_norm_(model.parameters(),1.0)
        optimizer.step()

and the output

out_class.shape= torch.Size([128, 18])
category_tensor.shape= torch.Size([128])
out_nextchars.shape= torch.Size([14, 128, 58])
line_tensor.shape= torch.Size([14, 128, 58])
out_nextchars[i,:].shape= torch.Size([128, 58])
line_tensor[i+1,:].shape= torch.Size([128, 58])
nextchar_i.shape= torch.Size([128, 1])
Traceback (most recent call last):
  File "names.py", line 252, in <module>
    nextchar_i= nextchar_i.view([1])
RuntimeError: shape '[1]' is invalid for input of size 128

by changing the line

nextchar_i= nextchar_i.view([1])

to

nextchar_i= nextchar_i.view([128])

the error went away, but, when I run my code my loss seems very high and the accuracy very low. \t image

I was wondering what would be the best way to go about solving this problem.

Here is my model code:

class Model(nn.Module):
    def __init__(self):
        super(Model,self).__init__()
        if args.model == 'gru':
            self.rnn = nn.GRU(len(vocabulary), args.hidden_layer_size, args.num_layers)
        if args.model == 'lstm':
            self.rnn  = nn.LSTM(len(vocabulary), args.hidden_layer_size, args.num_layers)
        if args.model == 'rnn':
            self.rnn = nn.RNN(len(vocabulary), args.hidden_layer_size, args.num_layers)
        #  linear layer for predicting class
        self.fc_class = nn.Linear(args.hidden_layer_size,len(all_categories))
        # second linear layer for predicting next chars
        self.fc_nextchars = nn.Linear(args.hidden_layer_size, len(vocabulary))
    def forward(self, x):
        # out is a 3rd order : len of line, batch size, hidden layer size
        out,h_n = self.rnn(x)
        out_class = self.fc_class(out[out.shape[0]-1,:,:])
        out_nextchars = torch.zeros(out.shape[0], out.shape[1], len(vocabulary) )
        for i in range(out.shape[0]):
            out_nextchars[i,:,:] = self.fc_nextchars(out[i,:,:])
        return out_class, out_nextchars

clayton-north avatar Mar 31 '20 23:03 clayton-north

The error is occurring on this line:

nextchar_i= nextchar_i.view([1])

and what the error is saying is that you are trying to change the shape of nextchar_i to [1], but nextchar_i has 128 elements inside of it, so you can't do that. My guess is that the size of 128 is coming because you using a batch size of 128.

To fix this, we need to dynamically figure out the size of nextchar_i and use that instead of 1. One way to do this is to loop over every element in nextchar_i.shape and multiply them together (although there are many other methods to do this as well).

mikeizbicki avatar Mar 31 '20 23:03 mikeizbicki