rl-agents icon indicating copy to clipboard operation
rl-agents copied to clipboard

Vehicle type changed in safe_deepcopy_env()

Open rongliangzi opened this issue 3 years ago • 5 comments

Hi, I'm using MCTS in your rl-agents repo under the env of your another repo highway_env. In agents/common/factory.py, I understand the function safe_deepcopy_env() copies the current state for simulation in MCTS. However, although I set the "other_vehicles_type": "highway_env.vehicle.behavior.IDMVehicle", the v in k, v in obj.__dict__.items(): has the type: highway_env.vehicle.controller.MDPVehicle(I print type(v) to the console).

I want to change some property of IDMVehicle when copy the state. But with the MDPVehicle, I cannot do that. In which function or .py file is the other vehicle's type changed to MDP class?

rongliangzi avatar Sep 04 '20 07:09 rongliangzi

Hi,

Hi, I'm using MCTS in your rl-agents repo under the env of your another repo highway_env. In agents/common/factory.py, I understand the function safe_deepcopy_env() copies the current state for simulation in MCTS

Correct.

However, although I set the "other_vehicles_type": "highway_env.vehicle.behavior.IDMVehicle", the v in k, v in obj.dict.items(): has the type: highway_env.vehicle.controller.MDPVehicle(I print type(v) to the console).

Are you sure that it is the case for all vehicles? MDPVehicle is the class of the controlled (ego-) vehicle (in green) while other vehicles in the scenes are often modeled as IDMVehicles, in blue. Maybe you checked the type of the ego-vehicle only?

I want to change some property of IDMVehicle when copy the state.

If you want, there is a mechanism that is quite close to what you need, that I called environment preprocessors. It is an environment method called on the env copy (that will be used for planning) to modify it. You can add a method in highway_env that changes the properties of IDMVehicles in the road, and then call add preprocessor to the agent configuration. For instance, there is a set_vehicle_field method that you can use. I use preprocessors e.g. to remove vehicles too far from the ego-vehicle -and thus irrelevant for planning- so as to decrease the computational load.

eleurent avatar Sep 04 '20 08:09 eleurent

Thanks for your timely reply!

(1) I modify the original safe_deepcopy_env() in factory.py to the following:

def safe_new_simulation_env(obj):
    """
        Perform a deep copy of an environment but without copying its viewer.
    """
    cls = obj.__class__
    result = cls.__new__(cls)
    memo = {id(obj): result}
    for k, v in obj.__dict__.items():
        if k not in ['viewer', 'automatic_rendering_callback', 'grid_render']:
            if isinstance(v, gym.Env):
                setattr(result, k, safe_deepcopy_env(v))
            else:
                if k == 'vehicle':
                    v1 = copy.deepcopy(v, memo=memo)
                    # v1.target_speed = 15
                    print('safe_new_simulation_env', k, type(v1))
                    setattr(result, k, v1)
                else:
                    setattr(result, k, copy.deepcopy(v, memo=memo))
        else:
            setattr(result, k, None)
    return result

And here is what the console prints(many times): safe_new_simulation_env vehicle <class 'highway_env.vehicle.controller.MDPVehicle'> So I think all the vehicles' classes are changed to the MDPVehicle.

(2) What I want to do is when copying the env, change the property (e.g. target_speed) like the commented line above. You mention preprocessor_env and set_vehicle_field. Following your idea, I do two things: (a) add this in abstract.py:

def set_vehicle_property(self, args: object) -> 'AbstractEnv':
        value = args
        env_copy = copy.deepcopy(self)
        for v in env_copy.road.vehicles:
            if isinstance(v, IDMVehicle):
                setattr(v, 'target_speed', value)
        return env_copy

(b) In mcts.py->MCTS->plan(), I change plan.py like this. Is it right? I mean, can they change each other vehicle's target_speed to 15? Does I need to add new_env.reset() after new_env=state.set_vehicle_property(15)?

def plan(self, state, observation):
        print('mcts: ', type(state))
        for i in range(self.config['episodes']):
            if (i+1) % 10 == 0:
                logger.debug('{} / {}'.format(i+1, self.config['episodes']))
            new_env = state.set_vehicle_property(15)
            #new_env.reset() # Is this necessary?
            self.run(new_env, observation)
            #self.run(safe_new_simulation_env(state), observation)
        return self.get_plan()

rongliangzi avatar Sep 04 '20 14:09 rongliangzi

(1) I modify the original safe_deepcopy_env() in factory.py to the following: And here is what the console prints(many times): safe_new_simulation_env vehicle <class 'highway_env.vehicle.controller.MDPVehicle'> So I think all the vehicles' classes are changed to the MDPVehicle.

No, you are reading the env.vehicle variable, which refers to the (controlled) ego-vehicle. But the all vehicles in the scenes are stored in the env.road.vehicles list.

(2) What I want to do is when copying the env, change the property (e.g. target_speed) like the commented line above. You mention preprocessor_env and set_vehicle_field. Following your idea, I do two things: (a) add this in abstract.py:

This seems alright.

(b) In mcts.py->MCTS->plan(), I change plan.py like this. Is it right? I mean, can they change each other vehicle's target_speed to 15? Does I need to add new_env.reset() after new_env=state.set_vehicle_property(15)?

I think the easiest way is to leave the code unchanges and simply modify the configuration of the MCTS agent. You are probably running the experiments.py script with the configs/HighwayEnv/agents/MCTSAgents/baseline.json? If so, you will see that the current configuration is:

{
    "__class__": "<class 'rl_agents.agents.tree_search.mcts.MCTSAgent'>",
    "env_preprocessors": [{"method":"simplify"}]
}

You can simply change it to:

{
    "__class__": "<class 'rl_agents.agents.tree_search.mcts.MCTSAgent'>",
    "env_preprocessors": [
        {
            "method":"simplify"
        },
        {
            "method": "set_vehicle_property",
            "args": 15
        }
    ]
}

But I believe you can also remove your set_vehicle_property and use the equivalent following configuration:

{
    "__class__": "<class 'rl_agents.agents.tree_search.mcts.MCTSAgent'>",
    "env_preprocessors": [
        {
            "method":"simplify"
        },
        {
            "method": "set_vehicle_field",
            "args": ["target_speed", 15]
        }
    ]
}

You could also use the OPD planning algorithm (scripts/configs/HighwayEnv/agents/DeterministicPlannerAgent/baseline.json), which may be more efficient than UCT.

eleurent avatar Sep 04 '20 14:09 eleurent

I understand it. Thanks for your explanation. I am running the experiments.py script with the configs/HighwayEnv/agents/MCTSAgents/baseline.json.

What I want to do is construct two environments. In real env, other vehicles have the 'target_speed' of 20, while in the simulation env, other vehicles have the target_speed of 18(just another number different from 20). So I think the settings in .json file cannot solve my problem and I need to change the attribute whenever copying current state.

rongliangzi avatar Sep 04 '20 15:09 rongliangzi

What I want to do is construct two environments. In real env, other vehicles have the 'target_speed' of 20, while in the simulation env, other vehicles have the target_speed of 18(just another number different from 20). So I think the settings in .json file cannot solve my problem and I need to change the attribute whenever copying current state.

On the contrary, I think the settings in the .json files do exactly that.

  1. You start with a true environment, where other vehicles have a speed of 20.
  2. If you define an env preprocessor in the agent configuration (.json), it will be called just before planning (planner.plan()). An env preprocessor always applies modifications to a new copy of the environment (just like in your set_vehicle_property()), so that the real env is left unchanged. In your case, only the preprocessed environment would have other vehicles with a target_speed of 18
  3. Then, the planning algorithm uses this modified env as it if were the true one, before recommending a best action to take.

eleurent avatar Sep 07 '20 07:09 eleurent