young
Results
1
comments of
young
@MarcoMeter I also used Torchviz to check that the gradient of the recurrent policy was correctly back-propagated. At first, I got the same result as you showed in the Pytorch...