onestep-rl icon indicating copy to clipboard operation
onestep-rl copied to clipboard

Potential bug in config

Open ezhang7423 opened this issue 2 years ago • 1 comments

Hi, I may just be misunderstanding the code but I believe that I found a bug. Currently in ./config/beta/bc.yaml you have the behavior policy distribution type is 'normal', but this will load into the pilearner on line 78 which currently uses 'trunc'. Is this a bug?

ezhang7423 avatar May 04 '22 06:05 ezhang7423

I don't think this is a bug, but is perhaps a bit of a rough edge. Since actions deployed in the environment must be clipped anyways, the samples taken from this distribution will be the same as the ones taken from the normal distribution. But during training with the bc loss it is more stable and computationally faster to train with standard normal instead of truncated.

davidbrandfonbrener avatar May 04 '22 13:05 davidbrandfonbrener