practical-torchtext
practical-torchtext copied to clipboard
Possible bug when calculating running loss
Hi there!
I've been looking at your training loop in the text classification tutorial and I see that you calculate a total running loss before averaging to get the average loss after the loop ends.
Looking at this line though I'm a little confused
running_loss += loss.data[0] * x.size(0)
When I print out the shape of x
I get these values
The batch sizes appear to be at index 1 so should the running loss line read
running_loss += loss.data[0] * x.size(1)
?
Entirely possible that I've made the mistake here but thought I would ask anyway!
Hi there!
I've been looking at your training loop in the text classification tutorial and I see that you calculate a total running loss before averaging to get the average loss after the loop ends.
Looking at this line though I'm a little confused
running_loss += loss.data[0] * x.size(0)
When I print out the shape ofx
I get these valuesThe batch sizes appear to be at index 1 so should the running loss line read
running_loss += loss.data[0] * x.size(1)
?Entirely possible that I've made the mistake here but thought I would ask anyway!
hello ,i have the same issue , have you ever solved this issue? just running_loss += loss.data[0] * x.size(1) is ok ?
Hi there!
I've been looking at your training loop in the text classification tutorial and I see that you calculate a total running loss before averaging to get the average loss after the loop ends.
Looking at this line though I'm a little confused
running_loss += loss.data[0] * x.size(0)
When I print out the shape ofx
I get these valuesThe batch sizes appear to be at index 1 so should the running loss line read
running_loss += loss.data[0] * x.size(1)
?Entirely possible that I've made the mistake here but thought I would ask anyway! running_loss+=loss.item() * x.size(0), works!!!
I ended up using .size(1). It seems to work
x.size(0)
is the lengths of comment_text for the current batch, which have been automatically padded by torchtext in advance. The meaning of multiplying x.size(0)
in scalar loss function is not clear. Maybe the author consider it something like batch weights ? But still NO sense.