ngx-toastr icon indicating copy to clipboard operation
ngx-toastr copied to clipboard

DDPG with MultiInputProcessor not working

Open srivatsankrishnan opened this issue 6 years ago • 6 comments

Hi, I have used MultiInputProcessor with DQN and it works fine. I am trying to use that feature to train an agent using DDPG. I have 3 inputs from my environment: Image, two 1D vectors of size (1,3) (called vec_1 and vec_2)

My Actor-network is defined as:

actor = Model ( inputs = [image, vec_1, vec_2] ,
                          outputs = actions
                         )

My Critic-network is defined as:

critic = Model (inputs =[actions, image,vec_1, vec_2],
                         outputs = action_1
                        ) 

"action_1" is of the same dimension as "action".

processor = MultiInputProcessor(nb_inputs=3)

agent = DDPGAgent(processor=processor,....,target_model_update=1e-3)

When I run the code, I am getting an AssertionError from processor.py. Basically the len(observation) is not same as nb_inputs.

When I check the observation variable that is passed to the processor.py, I see that my "vec_1" and "vec_2" variables are not passed.

Since Critic network takes action as an input along with observation, the number of input to the critic model is one more than Actor-network. So how does MultiInputProcessor work with DDPG? Are there any examples of using DDPG with MultiInputProcessor?

Thanks

srivatsankrishnan avatar Jan 17 '19 00:01 srivatsankrishnan

@srivatsankrishnan Can you post the full traceback? I've had some similar issues with multiple inputs and DDPG.

mattdornfeld avatar Jan 17 '19 03:01 mattdornfeld

Hi @mattdornfeld , Here is the trace:

Traceback (most recent call last):
  File "test_ddpg.py", line 177, in <module>
    agent.fit(env, nb_steps=50000, visualize=False, verbose=1, nb_max_episode_steps=200)
  File "/usr/local/lib/python2.7/dist-packages/keras_rl-0.4.2-py2.7.egg/rl/core.py", line 170, in fit
    action = self.forward(observation)
  File "/usr/local/lib/python2.7/dist-packages/keras_rl-0.4.2-py2.7.egg/rl/agents/ddpg.py", line 214, in forward
    action = self.select_action(state)  # TODO: move this into policy
  File "/usr/local/lib/python2.7/dist-packages/keras_rl-0.4.2-py2.7.egg/rl/agents/ddpg.py", line 198, in select_action
    batch = self.process_state_batch([state])
  File "/usr/local/lib/python2.7/dist-packages/keras_rl-0.4.2-py2.7.egg/rl/agents/ddpg.py", line 195, in process_state_batch
    return self.processor.process_state_batch(batch)
  File "/usr/local/lib/python2.7/dist-packages/keras_rl-0.4.2-py2.7.egg/rl/processors.py", line 37, in process_state_batch
    assert len(observation) == self.nb_inputs
AssertionError

In case of DQN, the observation variable passed into the processor.py was array of arrays something of the form: [[image],[vec_1],[vec_2]].

In DDPG, the observation is of form [[image]].

Did you get DDPG working with MultiInputProcessor? -Sri

srivatsankrishnan avatar Jan 17 '19 15:01 srivatsankrishnan

I think with the pull-request #195 this can be amended: if your image, and vectors (and whatever more input) are named variables, it is rather easy to put them into their correct places. You would have to define the action as a named input to your model too, so it can be added. Then, I would make an optional parameter in the MultiInputProcessor which says ddpg=False, where if you set it to True, the assert command would behave differently (accepting both self.nb_inputs and self.nb_inputs + 1. The action should be correctly recognised, and treated as any other input.

The only thing I have no immediate solution for is the backpropagation part ( backward()) in the ddpg agent, which I haven't looked at in great detail - this would need to be updated for the dictionary case. This might be tedious.

Of course you might also try to just use ddpq=True on the processor, allowing for both numbers, and hope for he best in the reordering process.

bklebel avatar Jan 17 '19 22:01 bklebel

in case you have more actions you can of course change the +1 to a variable....after all, the only problem you have is the assertion check, as you have two different numbers for the length of an observation. (apart from the question whether everything goes to the correct slots) What number you actually fill in would depend on the "shape" of your action.

bklebel avatar Jan 18 '19 10:01 bklebel

@bklebel Thanks for taking time for this. Appreciate it. The pull request #195 requires me to change the environment I have currently have to change it to Dict spaces. I think it should be fairly simple to make that change. If the backprop is not enabled for dictionary case in DDPG, then it won't even learn right.

You are right that its assertion error. I think the issue here is that the observation variable is not getting the other inputs like it used to in DQN. It only gets the image input.

I don't understand the flow of MultiInputProcessor for DDPG. Does it first passes multi-input of actor network and then later to critic? Do we have to define two processors one for actor and one for critic? Should the processor be re-written to have a flag that says when it is actor and when it is critic?

It will be easy if we have a toy example to play with before applying it to my use case.

srivatsankrishnan avatar Jan 18 '19 15:01 srivatsankrishnan

yes, that's true, as long as ddpg remains unchainged, with #195 it would not learn, and possibly crash anyways.

Well, I guess my analysis of the situation was not that well, I did not look closely enough into your traceback: your assertion is raised in the self.select_action(self, state), where there should not be any interference from your action in the critic - there is no action yet after all.

In ddpg, as I see it, the action is never processed together with the processor: the function process_state_batch(batch) is called three times: lines 197, 263, 264.

That is once in select_action, and twice in backward, where you get the batch of 'first state's and 'second state's, and pass them through the processor separately. There is no need (and thus it is not happening), for the action to pass through the processor, as it exists separately already, and is added later (line 274-281).

There is literally no difference between dqn and ddpg in what is done with an observation until it comes to the processor. A toy model to check this would be nice - all the more, since I cannot see how the MultiInputProcessor would not work with ddpg just as it does with dqn. Please check that you are up to date with the master branch, and did not change anything there. If that is the case, I would follow the observation, as it is passed from the environment and through the agent, and try to find the spot where your image is separated from your vectors.

And just to unstrangle this from #195, as long as you don't expressively want the named inputs (because of a special model), you absolutely don't need it.

bklebel avatar Jan 18 '19 17:01 bklebel