Reinforcement_Learning icon indicating copy to clipboard operation
Reinforcement_Learning copied to clipboard

Dueling question

Open rustequal opened this issue 4 years ago • 1 comments
trafficstars

Hello, Thanks for a great project. It's very useful. I have a question on the model code related to the Dueling algorithm. For example: Pong-v0_DQN_CNN_TF2.py

Here is an example of the code: action_advantage = Lambda(lambda a: a[:, :] - K.mean(a[:, :], keepdims=True), output_shape=(action_space,))(action_advantage)

let's say our batch looks like this: a = tf.constant([[1.0, 2.0], [-2.0, 3.0], [3.0, -4.0]]) print('a=', a) a= tf.Tensor( [[ 1. 2.] [-2. 3.] [ 3. -4.]], shape=(3, 2), dtype=float32)

The result of the "K.mean" function will be a tensor with shape (1, 1): print('Kmean=', K.mean(a[:, :], keepdims=True)) Kmean= tf.Tensor([[0.5]], shape=(1, 1), dtype=float32)

Shouldn't there be a tensor with shape (3, 1)? print('Kmean=', K.mean(a[:, :], axis=1, keepdims=True)) Kmean= tf.Tensor( [[ 1.5] [ 0.5] [-0.5]], shape=(3, 1), dtype=float32)

If we assume that our batch contains 3 elements, then the mean value should be calculated for each element in the batch separately. Or am I missing something ?

rustequal avatar Dec 18 '20 23:12 rustequal

same here

RangeOfGlitching avatar Jun 14 '22 15:06 RangeOfGlitching