random-network-distillation
random-network-distillation copied to clipboard
Wrong PPO Model architecture.
According to the DQN nature paper and PPO1 implementation, this line:
X = activ(conv(X, 'c3', nf=64, rf=4, stride=1, init_scale=np.sqrt(2), data_format=data_format))
should be changed to:
X = activ(conv(X, 'c3', nf=64, rf=3, stride=1, init_scale=np.sqrt(2), data_format=data_format))
In short, kernel size is wrong!
这两行有什么区别?
@xiaioding
The difference is in kernel sizes (rf
.)