deep-belief-nets-for-topic-modeling
deep-belief-nets-for-topic-modeling copied to clipboard
Wrong derivative of first layer in get_grad_and_error?
Hi Lars,
I have one question to the derivative you have used during back propagation. I haven't quite understood why you are using the normalized data in your gradient evaluation for the weights of the input layer. Shouldn't it be just the unnormalized x values?
In get_grad_and_error function of fine-tuning, you calculate
x[:, :-1] = get_norm_x(x[:, :-1])
...
for i in range(number_of_weights - 1, -1, -1):
if i == number_of_weights - 1:
...
elif i == 0:
...
grad = dot(x.T, delta) #
else:
...
Is it not:
D = x[:, :-1] .sum(axis=1)
D = D[:, numpy.newaxis]
...
for i in range(number_of_weights - 1, -1, -1):
if i == number_of_weights - 1:
...
elif i == 0:
...
grad = numpy.dot( numpy.append( x[:, :-1], D, axis = 1).T, delta)
else:
...
Hello,
Sorry for the late reply. The reason for this is to make sure that the log in the cross-entropy cost function doesn’t complain. It has no effect on the performance of the model, but will ensure that you will get no division by zero error:
np.log(0) main:1: RuntimeWarning: divide by zero encountered in log -inf
Please let me know if you have any other questions.
Thanks.
Best regards
Lars Maaløe PHD Student DTU Compute Technical University of Denmark (DTU)
Email: [email protected], [email protected] Phone: 0045 2229 1010 Skype: lars.maaloe LinkedIn http://dk.linkedin.com/in/larsmaaloe DTU Orbit http://orbit.dtu.dk/en/persons/lars-maaloee(0ba00555-e860-4036-9d7b-01ec1d76f96d).html
On 07 Mar 2015, at 15:58, simNN7 [email protected] wrote:
Hi Lars,
I have one question to the derivative you used during back propagation. I haven't quite understood why you are using the normalized data in your gradient evaluation. Shouldn't it be just the unnormalized x values?
in get_grad_and_error of fine-tuning:
x[:, :-1] = get_norm_x(x[:, :-1]) ... for i in range(number_of_weights - 1, -1, -1): if i == number_of_weights - 1: ... elif i == 0: ... grad = dot(x.T, delta) # <--- unnormalized inputs here? else: ... — Reply to this email directly or view it on GitHub https://github.com/larsmaaloee/deep-belief-nets-for-topic-modeling/issues/2.
Hi,
yes, I see that you need to avoid zero division. My question is not, why are you using the probability of words in the cross-entropy error function, but rather: why are you using the probability-of-words (array) instead of the word count array in the evaluation of the gradient of the first layer (i=0)?
Hello again,
It is a common trick to compare the probabilities to the normalised word counts to avoid sampling from the multinomial distribution.
Best regards
Lars Maaløe PHD Student DTU Compute Technical University of Denmark (DTU)
Email: [email protected], [email protected] Phone: 0045 2229 1010 Skype: lars.maaloe LinkedIn http://dk.linkedin.com/in/larsmaaloe DTU Orbit http://orbit.dtu.dk/en/persons/lars-maaloee(0ba00555-e860-4036-9d7b-01ec1d76f96d).html
On 12 Mar 2015, at 21:37, simNN7 [email protected] wrote:
Hi,
yes, I see that you need to avoid zero division. My question is not, why are you using the probability of words in the cross-entropy error function, but rather: why are you using the probability-of-words (array) instead of the word count array in the evaluation of the gradient of the first layer (i=0)?
— Reply to this email directly or view it on GitHub https://github.com/larsmaaloee/deep-belief-nets-for-topic-modeling/issues/2#issuecomment-78605851.