reinforcement-learning Potential bug in tf.contrib.distributions.Normal

Potential bug in tf.contrib.distributions.Normal

Open chingyaoc opened this issue 8 years ago • 4 comments

Hi Denny,

Recently I'm working on continuous control reinforcement learning task. I fillowed the steps in Continuous MountainCar Actor Critic Solution to construct PolicyEstimator(). However the log probability of self.normal_dist.log_prob() become positive when self.mu become a small value (<0.2). I'm wondering that if this is the bug of Tensorflow it self since they calculate the pdf by f(x) = sqrt(1/(2*pi*sigma^2)) exp(-(x-mu)^2/(2*sigma^2)). Did you face the same problem while implementing the policy?

Best, James

Jan 03 '17 17:01 chingyaoc

Hm, interesting. I do remember having some probems with the policy, but it worked most of the time, so I didn't really look it it. I recommend you file a bug with Tensorflow.

Jan 04 '17 16:01 dennybritz

That's not a bug.

Because normal distribution is a continuous probability distribution, self.normal_dist.prob() actually means the probability density function (pdf) and it can be any value as long as it's greater than 0. So don't be surprised if you got positive value when you call self.normal_dist.log_prob().

Feb 08 '17 20:02 poweic

@JamesChuanggg
A distribution should retrieve a value between 0 and 1. Log of that value should always be negative. How could you get a positive value from self.normal_dist.log_prob( ) method?

@botonchou You are talking about the sample point. They can be any real number. However, we are talking about (log) probability.

Jan 31 '18 08:01 huiwenzhang

@huiwenzhang

I believe your assumption is false, if I correctly understand what you mean by distribution.

The values of a probability density function are not necessarily less than 1. (They are not probabilities.)

They can be greater than 1 when the mass is concentrated around a few values. For example, the probability density function of the normal distribution with standard deviation 0.25 will have values greater than 1 over the interval [-0.225, +0.225].

Hope this helps.

Jul 11 '19 03:07 engheta

reinforcement-learning reinforcement-learning copied to clipboard

Potential bug in tf.contrib.distributions.Normal

reinforcement-learning
reinforcement-learning copied to clipboard