random-network-distillation Wrong PPO Model architecture.

Wrong PPO Model architecture.

Open alirezakazemipour opened this issue 4 years ago • 2 comments

According to the DQN nature paper and PPO1 implementation, this line:

X = activ(conv(X, 'c3', nf=64, rf=4, stride=1, init_scale=np.sqrt(2), data_format=data_format))

should be changed to:

X = activ(conv(X, 'c3', nf=64, rf=3, stride=1, init_scale=np.sqrt(2), data_format=data_format))

In short, kernel size is wrong!

Oct 06 '20 16:10 alirezakazemipour

这两行有什么区别？

Apr 10 '23 04:04 xiaioding

@xiaioding The difference is in kernel sizes (rf.)

Apr 11 '23 14:04 alirezakazemipour