baselines Gumbel Distribution and Derivability

Gumbel Distribution and Derivability

Open mm1212345 opened this issue 4 years ago • 0 comments

Hey there! I am currently working my way through the action sampling process from a categorical variable. In order to get from the logits to the probabilities as accurately as possible, the Gumbel noise is added to the logits. This is the reason for the double log. Correct?

But still, the action is choosen with tf.argmax(self.logits - tf.log(-tf.log(u)), axis=-1). Isn't it the case that still the argmax operation results in the whole sampling process not being derivable? What else do I not understand?

Feb 19 '21 15:02 mm1212345

baselines baselines copied to clipboard

Gumbel Distribution and Derivability

baselines
baselines copied to clipboard