onestep-rl
onestep-rl copied to clipboard
Potential bug in config
Hi, I may just be misunderstanding the code but I believe that I found a bug. Currently in ./config/beta/bc.yaml
you have the behavior policy distribution type is 'normal', but this will load into the pilearner on line 78 which currently uses 'trunc'. Is this a bug?
I don't think this is a bug, but is perhaps a bit of a rough edge. Since actions deployed in the environment must be clipped anyways, the samples taken from this distribution will be the same as the ones taken from the normal distribution. But during training with the bc loss it is more stable and computationally faster to train with standard normal instead of truncated.