deep-belief-nets-for-topic-modeling icon indicating copy to clipboard operation
deep-belief-nets-for-topic-modeling copied to clipboard

Wrong derivative of first layer in get_grad_and_error?

Open simNN7 opened this issue 9 years ago • 3 comments

Hi Lars,

I have one question to the derivative you have used during back propagation. I haven't quite understood why you are using the normalized data in your gradient evaluation for the weights of the input layer. Shouldn't it be just the unnormalized x values?

In get_grad_and_error function of fine-tuning, you calculate

   x[:, :-1] = get_norm_x(x[:, :-1])
    ...
    for i in range(number_of_weights - 1, -1, -1):
        if i == number_of_weights - 1:
            ...
        elif i == 0:
            ...
            grad = dot(x.T, delta)  #  
        else:
           ...

Is it not:

  D = x[:, :-1] .sum(axis=1)
  D = D[:, numpy.newaxis]
   ...
    for i in range(number_of_weights - 1, -1, -1):
        if i == number_of_weights - 1:
            ...
        elif i == 0:
            ...
            grad = numpy.dot( numpy.append( x[:, :-1], D, axis = 1).T, delta)
        else:
           ...

simNN7 avatar Mar 07 '15 14:03 simNN7

Hello,

Sorry for the late reply. The reason for this is to make sure that the log in the cross-entropy cost function doesn’t complain. It has no effect on the performance of the model, but will ensure that you will get no division by zero error:

np.log(0) main:1: RuntimeWarning: divide by zero encountered in log -inf

Please let me know if you have any other questions.

Thanks.

Best regards


Lars Maaløe PHD Student DTU Compute Technical University of Denmark (DTU)

Email: [email protected], [email protected] Phone: 0045 2229 1010 Skype: lars.maaloe LinkedIn http://dk.linkedin.com/in/larsmaaloe DTU Orbit http://orbit.dtu.dk/en/persons/lars-maaloee(0ba00555-e860-4036-9d7b-01ec1d76f96d).html

On 07 Mar 2015, at 15:58, simNN7 [email protected] wrote:

Hi Lars,

I have one question to the derivative you used during back propagation. I haven't quite understood why you are using the normalized data in your gradient evaluation. Shouldn't it be just the unnormalized x values?

in get_grad_and_error of fine-tuning:

x[:, :-1] = get_norm_x(x[:, :-1]) ... for i in range(number_of_weights - 1, -1, -1): if i == number_of_weights - 1: ... elif i == 0: ... grad = dot(x.T, delta) # <--- unnormalized inputs here? else: ... — Reply to this email directly or view it on GitHub https://github.com/larsmaaloee/deep-belief-nets-for-topic-modeling/issues/2.

larsmaaloee avatar Mar 12 '15 10:03 larsmaaloee

Hi,

yes, I see that you need to avoid zero division. My question is not, why are you using the probability of words in the cross-entropy error function, but rather: why are you using the probability-of-words (array) instead of the word count array in the evaluation of the gradient of the first layer (i=0)?

simNN7 avatar Mar 12 '15 20:03 simNN7

Hello again,

It is a common trick to compare the probabilities to the normalised word counts to avoid sampling from the multinomial distribution.

Best regards


Lars Maaløe PHD Student DTU Compute Technical University of Denmark (DTU)

Email: [email protected], [email protected] Phone: 0045 2229 1010 Skype: lars.maaloe LinkedIn http://dk.linkedin.com/in/larsmaaloe DTU Orbit http://orbit.dtu.dk/en/persons/lars-maaloee(0ba00555-e860-4036-9d7b-01ec1d76f96d).html

On 12 Mar 2015, at 21:37, simNN7 [email protected] wrote:

Hi,

yes, I see that you need to avoid zero division. My question is not, why are you using the probability of words in the cross-entropy error function, but rather: why are you using the probability-of-words (array) instead of the word count array in the evaluation of the gradient of the first layer (i=0)?

— Reply to this email directly or view it on GitHub https://github.com/larsmaaloee/deep-belief-nets-for-topic-modeling/issues/2#issuecomment-78605851.

larsmaaloee avatar Mar 19 '15 10:03 larsmaaloee