char-rnn.pytorch
char-rnn.pytorch copied to clipboard
Tensor Size Mismatch During Training
While training at a seemingly random point, it fails with this error (both lstm and gru):
Traceback (most recent call last):
File "train.py", line 98, in <module>
loss = train(*random_training_set(args.chunk_len, args.batch_size))
File "train.py", line 43, in random_training_set
inp[bi] = char_tensor(chunk[:-1])
RuntimeError: The expanded size of the tensor (200) must match the existing size (199) at non-singleton dimension 0. Target sizes: [200]. Tensor sizes: [199]
I've attached the data set I've been using.
I've been having the same issue. However, I only encounter it when I have a large epoch number. What epoch number did you have set in the arguments? By large I mean 100,000 and so.
I didn't set one, I just used the defaults. It seems to happen somewhat randomly. Sometimes it will get a few epochs in, other times it crashes almost immediately.
I ran into the same problem, and it's indeed caused by this line in train.py:
start_index = random.randint(0, file_len - chunk_len)
Subtract 1 from the right boundary should fix the problem:
start_index = random.randint(0, file_len - chunk_len - 1)
This is because for randint(a, b), the right boundary b is included. So if the sampler happens to select b, the error would raise because the end index is out of the file boundary.
To be specific, for the following 2 lines,
end_index = start_index + chunk_len + 1 chunk = file[start_index:end_index]
the slicing operation woundn't raise an out-of-boundary error, instead, the right boundary would be set to min(file_length, end_index - 1), which would only cut out chunk_len - 1
length of characters.