ray
ray copied to clipboard
[RLlib] Attribute error when trying to compute action after training Multi Agent PPO with New API Stack
What happened + What you expected to happen
After training Multi Agent PPO with new New API Stack under the guidance of how-to-use-the-new-api-stack I tried to compute actions:
saved_algorithm = Algorithm.from_checkpoint(
checkpoint=algorithm_path,
policy_ids={"controlled_vehicle_0", "controlled_vehicle_1"},
policy_mapping_fn=lambda agent_id, episode, **kwargs: f"controlled_vehicle_{agent_id}",
)
print("saved_algorithm type:", type(saved_algorithm))
# Evaluate the model
obs, info = env.reset()
print("obs:", obs)
actions = {}
for agent_id, agent_obs in obs.items():
policy_id = f"controlled_vehicle_{agent_id}"
action = saved_algorithm.get_policy(policy_id).compute_single_action(agent_obs)
actions[agent_id] = action
print("action", actions)
but I get the error message:
AttributeError: 'MultiAgentEnvRunner' object has no attribute 'get_policy'
I also tried some other way like:
action = saved_algorithm.compute_single_action(agent_obs, policy_id)
but still get the same error message: AttributeError: 'MultiAgentEnvRunner' object has no attribute 'get_policy'.
I have seen a similar issue in #40312, are these two issues the same?
detailed error message are as follows:
Traceback (most recent call last): File "test_evaluate.py", line 151, in
evaluate_agent(saved_algorithm, env) File "test_evaluate.py", line 112, in evaluate_agent action = saved_algorithm.get_policy(policy_id).compute_single_action(agent_obs) File "C:\Users\Ice Cream\miniconda3\envs\env_highway\lib\site-packages\ray\util\tracing\tracing_helper.py", line 467, in _resume_span return method(self, *_args, **_kwargs) File "C:\Users\Ice Cream\miniconda3\envs\env_highway\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 2051, in get_policy return self.workers.local_worker().get_policy(policy_id) AttributeError: 'MultiAgentEnvRunner' object has no attribute 'get_policy'
and before I call this method, I also printed the relevant info, this part looks normal:
saved_algorithm type: <class 'ray.rllib.algorithms.ppo.ppo.PPO'> saved_algorithm.get_config() <ray.rllib.algorithms.ppo.ppo.PPOConfig object at 0x0000014B0E4D4370>
through the code:
print("saved_algorithm type:", type(saved_algorithm))
print("saved_algorithm.get_config()",saved_algorithm.get_config())
Versions / Dependencies
Ray 2.10.0 Python 3.8.18 Windows11
Reproduction script
the code used for training is as follows:
register_env("ray_dict_highway_env", create_env)
config = (
PPOConfig().environment(env="ray_dict_highway_env")
.experimental(_enable_new_api_stack=True)
.rollouts(env_runner_cls=MultiAgentEnvRunner)
.resources(
num_learner_workers=0,
num_gpus_per_learner_worker=1,
num_cpus_for_local_worker=1,
)
.training(model={"uses_new_env_runners": True})
.multi_agent(
policies={
"controlled_vehicle_0",
"controlled_vehicle_1"
},
policy_mapping_fn=lambda agent_id, episode, **kwargs: f"controlled_vehicle_{agent_id}",
)
.framework("torch")
)
current_script_directory = os.path.dirname(os.path.abspath(__file__))
ray_result_path = os.path.join(current_script_directory, folder_path, "ray_results")
tuner = tune.Tuner(
"PPO",
run_config=RunConfig(
storage_path=ray_result_path,
name="2-agent-PPO",
stop={"timesteps_total": 5e5}
),
param_space=config.to_dict()
)
results = tuner.fit()
And the code for loading checkpoints:
algorithm_path = r"D:\DRL_Project\DRL_highway\experiments\hw-fast-ma-dict-v0_rllib-mappo\2024-04-01_01-28\ray_results\2-agent-PPO\PPO_ray_dict_highway_env_1c7ab_00000_0_2024-04-01_01-28-38\checkpoint_000000"
saved_algorithm = Algorithm.from_checkpoint(
checkpoint=algorithm_path,
policy_ids={"controlled_vehicle_0", "controlled_vehicle_1"},
policy_mapping_fn=lambda agent_id, episode, **kwargs: f"controlled_vehicle_{agent_id}",
)
Issue Severity
High: It blocks me from completing my task.
It seems I have initially solved this issue by referring to the solutions in #40312 and the comments in the ray.rllib.core.rl_module code. using the following code repalcing the original code:
# Evaluate the model
obs, info = env.reset()
print("obs:", obs)
actions = {}
for agent_id, agent_obs in obs.items():
# Determine the policy ID for the current agent using the policy mapping function
policy_id = f"controlled_vehicle_{agent_id}"
# Compute actions for each agent
rl_module = saved_algorithm.get_module(policy_id)
fwd_ins = {"obs": torch.Tensor([agent_obs])}
fwd_outputs = rl_module.forward_inference(fwd_ins)
action_dist_class = rl_module.get_inference_action_dist_cls()
action_dist = action_dist_class.from_logits(
fwd_outputs["action_dist_inputs"]
)
action = action_dist.sample()[0].numpy()
actions[agent_id] = action
# actions = saved_algorithm.compute_actions(obs)
print("actions: ", actions)
and the output is as follows:
actions: {0: array(4, dtype=int64), 1: array(3, dtype=int64)}
it seems to be working.
but when I used a similar code approach to conduct multiple episode evaluations, the results were significantly worse compared to the training outcomes reported during training: episode_len_mean dropped from about 29 to 7, and episode_reward_mean decreased from about 42 to 10. After recording the videos, it was also evident that the agent indeed had its own strategy π, but the performance was relatively poor. I suspect that the specific steps I used to calculate actions directly through the rl_module might differ from those actually used during training, but I am not clear on the specific steps for action calculation used during training. Therefore, I would like to ask if I might have indeed done something wrong in this part?
and I still wonder why can't directly use the saved_algorithm.get_policy(policy_id).compute_single_action(agent_obs) or saved_algorithm.compute_single_action(agent_obs, policy_id) for action computing? or does this mean that this invocation will have a new syntax in the new API stack.
I am facing a similar issue with the SingleAgentEnvRunner - see screeenshot attached.
i dont know why this issue is p2. should be p0
This issue is still persistent. Hey @simonsays1980 @sven1977. Is there any plan to fix this?
I'm unfortunately having this issue too:
ppo_agents = Algorithm.from_checkpoint(checkpoint=checkpoint_path)
actions = ppo_agents.compute_actions(observations=observations)
Results in:
AttributeError: 'MultiAgentEnvRunner' object has no attribute 'get_policy'
Applying a fix similar to what @Dr-IceCream did, which follows #40312, gives very degraded results, to the point where it's unusable :(
Since we've had a similar P1 issue, should this be upgraded? @simonsays1980 @sven1977
Same error here. I'm unable to use a trained model.
I am also seeing this error in both PPO and SAC. Is there a recommended workaround or stable commit to rollback to?
I see this error in my singleAgent too. Is it related to mismatch between New API stack and old one or something else? not any solution for this? 🙄
I found a solution for this. I changed the action call method (according to this: https://docs.ray.io/en/master/rllib/rllib-training.html) as below:
from ray.rllib.core.rl_module import RLModule
# Create only the neural network (RLModule) from our checkpoint.
rl_module = RLModule.from_checkpoint(
pathlib.Path(best_checkpoint) / "learner_group" / "learner" / "rl_module"
)["default_policy"
for call action:
while not terminated and not truncated:
env.render()
# Compute the next action from a batch (B=1) of observations.
torch_obs_batch = torch.from_numpy(np.array([obs]))
action_logits = rl_module.forward_inference({"obs": torch_obs_batch})[
"action_dist_inputs"
[ ]
# The default RLModule used here produces action logits (from which
# we'll have to sample an action or use the max-likelihood one).
action_probs = torch.nn.functional.softmax(action_logits[0], dim=-1)
action = np.random.choice(len(action_probs), p=action_probs.numpy())
obs, reward, terminated, truncated, info = env.step(action)
episode_return += reward
There does not seem to be any incentive to fix this, as this part of the code is labelled "OldAPIStack". I am going to try to export the model using torch directly. It would probably still be nice to have some convenience function in the Algorithm class to do the export directly.
Seeing the same issue.
Additionally,Algorithm.from_checkpiont(checkpoint) gives me a AttributeError: 'Checkpoint' object has no attribute 'as_posix' whereas Algorithm.from_checkpoint(checkpoint.path) does not.
Unfortunately I have not been able to use torch.onnx export methods so far to export the model. I need dictionary inputs for my model and I have not found a way to use the onnx export methods with dictionary input. So the situation is a bit frustrating right now. Hope I find a solution soon. Any help is appreciated as I cannot export my model right now.
Unfortunately I have not been able to use torch.onnx export methods so far to export the model. I need dictionary inputs for my model and I have not found a way to use the onnx export methods with dictionary input. So the situation is a bit frustrating right now. Hope I find a solution soon. Any help is appreciated as I cannot export my model right now.
Update: I was able to export my model using torch.onnx.export using Algorithm.get_module, which is part of the new api stack. The key was to use a dictionary input in the form of {{key1: value1, key2: value2, ...}, {}} for the args argument of the export function (note the second, in my case empty, dictionary). It is also important to account for batch size in your values passed via the dictionary. Here is an example that shows my use case:
rl_module_player_white = algo.get_module("policy_white")
batch_size = 1
dummy_input = ({
"obs": torch.tensor(
np.array(
[[np.random.randint(0, n) for n in game_state_multi_discrete_sizes] for _ in range(batch_size)]
),
dtype=torch.float32,
),
"action_mask": torch.tensor(
np.random.randint(0, 2, (batch_size, action_space_length)), dtype=torch.float32
),
}, {})
torch.onnx.export(
rl_module_player_white, # The model to export
dummy_input, # Dummy input for tracing
"./export_white/model.onnx", # Output path
export_params=True, # Store parameters in the model file
opset_version=17, # ONNX version
do_constant_folding=True,# Optimize constant expressions
input_names=["obs", "action_mask"], # Names for model inputs
output_names=["embeddings", "output"], # Names for model outputs
)
One more hint: To properly set the output just check the output of your rl_module with the dummy input to know the exact output you are getting. So for the example above this would look like:
dummy_output = rl_module_player_white(dummy_input[0])
I am facing the same error with Ray RLlib versions 2.40.0 and 2.40.1. Is there any news about that? I can use the RL module approach, but the Algorithm seems simpler and straightforward.
@simonsays1980 @sven1977
The problem still persists(AttributeError: 'SingleAgentEnvRunner' object has no attribute 'get_policy') There isn't still any reliable fix for that??
Having the same problem. No intention to solve it?!
`
AttributeError Traceback (most recent call last)
/usr/local/lib/python3.11/dist-packages/ray/rllib/algorithms/algorithm.py in get_policy(self, policy_id) 2107 policy_id: ID of the policy to return. 2108 """ -> 2109 return self.env_runner.get_policy(policy_id) 2110 2111 @PublicAPI
AttributeError: 'MultiAgentEnvRunner' object has no attribute 'get_policy' `
I found a solution for this. I changed the action call method (according to this: https://docs.ray.io/en/master/rllib/rllib-training.html) as below:
from ray.rllib.core.rl_module import RLModule
Create only the neural network (RLModule) from our checkpoint.
rl_module = RLModule.from_checkpoint( pathlib.Path(best_checkpoint) / "learner_group" / "learner" / "rl_module" )["default_policy" for call action:
while not terminated and not truncated: env.render() # Compute the next action from a batch (B=1) of observations. torch_obs_batch = torch.from_numpy(np.array([obs])) action_logits = rl_module.forward_inference({"obs": torch_obs_batch}) "action_dist_inputs" [ ] # The default RLModule used here produces action logits (from which # we'll have to sample an action or use the max-likelihood one). action = torch.argmax(action_logits[0]).numpy() obs, reward, terminated, truncated, info = env.step(action) episode_return += reward
HI @AlizargarDeu : My saved checkpoint directory does not seem to have a learner_group subdirectory. Is that something that needs to be specified while saving the checkpoints?
Also @simonsays1980 : Is it possible to update the issue to p0. Being unable to use a trained model on the latest stable version of RLlib should imo be getting a higher priority.
Hi @AdithyaRaman No, actually when you use New API stack these will be created automatically in saved training directory , somewhere in your temp directory. It is as below: root>temp> chekpoint_tmp...> env_runners , lerner_group ....When you give the saved directory, it will be loaded accordingly.
Running into this issue as well on the new api stack, so imo +1 that this priority should be bumped up to P0 as well
Same question here, but I have a new idea to solve it. Restore trained model, and add some output in environment, after that contrinue to train one times. 😵💫😵💫😵💫
Does that really work? I have been looking for viable work-arounds.
Hi @AdithyaRaman No, actually when you use New API stack these will be created automatically in saved training directory , somewhere in your temp directory. It is as below: root>temp> chekpoint_tmp...> env_runners , lerner_group ....When you give the saved directory, it will be loaded accordingly.
@AlizargarDeu : Got it. I am now succesfully able to rl_module.forward_inference(...) to get something but I am not sure this is giving the actions that I can plug in to the environment. When I look at rl_module.config to check at the action_space dimension I see the below output which says that my action space should be a vector of length 2. (And this is what I want to see. My environment needs an action in this specified action_space)
RLModuleConfig(observation_space=Box(0.0, 6.0, (15,), float64), action_space=Box(-1.0, 1.0, (2,), float64), inference_only=False, learner_only=False, model_config_dict={'twin_q': True}, catalog_class=<class 'ray.rllib.algorithms.sac.sac_catalog.SACCatalog'>)
However, The output of 'rl_module.forward_inference({"obs": torch_obs_batch})' is a vector of length 4. I am a bit confused on how I can make use of this to get my action which I can use to perform a environment step.
I think your problem is something between the trained model and your environment. Why it should give you a vector with 4 when you pass your environment with an action_space 2??!
may it helps you for training :
# Define the PPO configuration
config = (
PPOConfig()
.api_stack(
enable_rl_module_and_learner=True,
enable_env_runner_and_connector_v2=True,
)
.environment(
env="Your Environment_Class name ",
env_config={"
your desired configuration(if needed)
"}
)
.training(
lr=selected_lr, # Learning rate
entropy_coeff= selected_entropy # Encourage exploration with entropy regularization
)
.env_runners(num_env_runners = number_of_env) # Number of parallel environments
)
I think your problem is something between the trained model and your environment. Why it should give you a vector with 4 when you pass your environment with an action_space 2??!
may it helps you for training :
Define the PPO configuration
config = ( PPOConfig() .api_stack( enable_rl_module_and_learner=True, enable_env_runner_and_connector_v2=True, ) .environment( env="Your Environment_Class name ", env_config={" your desired configuration(if needed) "} ) .training( lr=selected_lr, # Learning rate entropy_coeff= selected_entropy # Encourage exploration with entropy regularization ) .env_runners(num_env_runners = number_of_env) # Number of parallel environments )
After several hours of staring at the screen, I figured it out. The output was a Gaussian distribution with mean and standard deviation for both the action spaces. Hence the 2x the number of values
After several hours of staring at the screen, I figured it out. The output was a Gaussian distribution with mean and standard deviation for both the action spaces. Hence the 2x the number of values
Hello @AdithyaRaman,
could you please share the code you used to extract the gaussian distribution tensor, and how you converted it into an action that the env can understand? I tried the following code below, but the agent is not behaving as expected.
import pathlib
import torch
import numpy as np
import ray
from ray.tune.registry import register_env
from raylib.custom_env import SimpleEnv
from ray.rllib.core.rl_module import RLModule
from torch.distributions import Normal
# Initialize Ray
ray.init(ignore_reinit_error=True)
# Register the environment
register_env("my_custom_env", lambda config: SimpleEnv())
# Define the checkpoint directory
checkpoint_dir = "/models_sac_1"
# Load the RLModule from the checkpoint
rl_module = RLModule.from_checkpoint(
pathlib.Path(checkpoint_dir) / "learner_group" / "learner" / "rl_module"
)["default_policy"]
# Test the agent in the environment
env = SimpleEnv()
obs, _ = env.reset()
done = False
truncated = False
total_reward = 0
# Run the agent in the environment
while not (done or truncated):
env.render() # Render the environment
# Compute the next action using the RLModule
torch_obs_batch = torch.from_numpy(np.array([obs])).to(torch.float32) # Ensure dtype is float32
action_logits = rl_module.forward_inference({"obs": torch_obs_batch}) # Perform inference
action_dist_inputs = action_logits['action_dist_inputs']
log_std, mean = torch.chunk(action_dist_inputs, 2, dim=-1)
# Convert log_std to std
std = torch.exp(log_std)
# Sample action from the Gaussian distribution
action = mean + std * torch.randn_like(std)
print(action)
action_np = action.detach().cpu().numpy() # This will be the vector for the env.step() function
obs, reward, done, truncated, info = env.step(action_np[0]) # Take the step in the environment
total_reward += reward
print(f"Total reward: {total_reward}")
# Clean up
ray.shutdown()
The agent performed very well during training, and always reached it's destination, so I assume that the problem is not my reward function. Thank you very much for your time!
Regards
Does that really work? I have been looking for viable work-arounds.
yes, it really works, continue to train one time and get the result.😵💫
Hello @alexhasenclever Is this updated script that you got good results? how was your acrion_space ? 2? discrete or continues? I tried with 6 discrete actions and got error!