pytorch_a3c icon indicating copy to clipboard operation
pytorch_a3c copied to clipboard

expected a Variable arg but got numpy.ndarray error

Open dylanthomas opened this issue 7 years ago • 6 comments

I am new to pyTorch, just cloned your codes and ran them, but got an error. I hope you to point me to the right direction to fix this issue.

More specifics:

  1. used conda env with python 3.6
  2. Ran 'run_a3c.py' with the default Breakout-v0 env till the end and ran 'python test_a3c.py --render --monitor --env Breakout-v0'
  3. got the below error message -

=== File "test_a3c.py", line 71, in test(policy, args) File "test_a3c.py", line 25, in test p, v = policy(o) ... File "/home/john/anaconda3/envs/th/lib/python3.6/site-packages/torch/nn/functional.py", line 37, in conv2d return f(input, weight, bias) if bias is not None else f(input, weight) RuntimeError: expected a Variable argument, but got numpy.ndarray

Could you tell me what could be the issue(s) here ?

Many thanks,

John

dylanthomas avatar Mar 06 '17 01:03 dylanthomas

torch.nn.module must take torch.Variable. But, policy(which is subclass of torch.nn.module) takes numpy.ndarray, so we have to convert numpy.ndarray to torch.Variable. I fixed this problem. See my commit 9e9fb687786a025061561c7260ba9b586e9ca4ce.

rarilurelo avatar Mar 06 '17 02:03 rarilurelo

Many thanks.

On another note, when I ran Breakout-v0, the reward that I got after 10M steps was 30~40M. But shouldn't this be around 400 according to the DeepMind's paper ? I wonder where the difference is coming from... Any thoughts/ insight on this ?

dylanthomas avatar Mar 06 '17 07:03 dylanthomas

There are some differences between my code and DeepMind's paper. My code is

  1. no LSTM use
  2. no gradient clipping
  3. no hyper parameter tuning ( I couldn't find lr in the paper )

That's why the result was not good enough, I think.

rarilurelo avatar Mar 06 '17 11:03 rarilurelo

Thank you for your reply. Two points --

On the param setting, are you aware of this wiki ( https://github.com/muupan/async-rl/wiki ) ?

On the performance issue of tensorflow implementation, have you seen this discussion ( https://github.com/dennybritz/reinforcement-learning/issues/30 It's on dqn, but the same issues are supposed to be the root cause on the A3C side as well )

Here cgel suggests the following are the key :

Important stuff:

Normalise input [0,1] Clip rewards [0,1] don't tf.reduce_mean the losses in the batch. Use tf.reduce_max initialise properly the network with xavier init use the optimizer that the paper uses. It is not same RMSProp as in tf

Has your code incorporated all the points above ?

dylanthomas avatar Mar 08 '17 01:03 dylanthomas

@dylanthomas did you try running Breakout-v0 for longer than 10M timesteps to see if avg reward eventually got to >400? For example, it took Muupan's A3C https://github.com/muupan/async-rl#a3c-ff 20M timesteps to start getting to >400.

ethancaballero avatar Mar 14 '17 07:03 ethancaballero

Not yet, but I will run this code for 20M to see if it goes up to 400. @ethancaballero

dylanthomas avatar Mar 15 '17 00:03 dylanthomas