mushroom-rl icon indicating copy to clipboard operation
mushroom-rl copied to clipboard

[Categorical DQN/Rainbow] Inconsistent behavior of Categorical DQN for an even number of atoms

Open Flo-Wo opened this issue 1 year ago • 0 comments

For an even number of atoms, the calculation of self._a_values (see here) does not seem to be 100% correct. This behavior is reproducible via

import torch
v_min = -5
v_max = 5
n_atoms = 20
delta = (v_max -  v_min) / (n_atoms - 1) # delta = 0.5263157894736842
torch.arange(v_min, v_max + delta, delta)

which yields

tensor([-5.0000, -4.4737, -3.9474, -3.4211, -2.8947, -2.3684, -1.8421, -1.3158,
        -0.7895, -0.2632,  0.2632,  0.7895,  1.3158,  1.8421,  2.3684,  2.8947,
         3.4211,  3.9474,  4.4737,  5.0000,  5.5263])

and is too big. The expected result would be this tensor:

tensor([-5.0000, -4.4737, -3.9474, -3.4211, -2.8947, -2.3684, -1.8421, -1.3158,
        -0.7895, -0.2632,  0.2632,  0.7895,  1.3158,  1.8421,  2.3684,  2.8947,
         3.4211,  3.9474,  4.4737,  5.0000])

According to torch.arange an easy solution would be to add a small eps value instead of delta, e.g.

self._a_values = torch.arange(self._v_min, self._v_max + 10e-9, delta)

or cutoff the last value in the case when the tensor is too big or use some internal eps value instead of a hard coded one.

Flo-Wo avatar Jul 24 '22 19:07 Flo-Wo