ray
ray copied to clipboard
[tune] Error saving checkpoint based on nested metric score
What happened + What you expected to happen
I tried running a simple RL training using rllib and set checkpoint_score_attr="evaluation/episode_reward_mean"
in tune.run()
. The training ran properly except the checkpoint saving. It showed this error messeges:
2022-08-09 07:20:15,138 ERROR checkpoint_manager.py:320 -- Result dict has no key: evaluation/episode_reward_mean. checkpoint_score_attr must be set to a key in the result dict. Valid keys are: ['evaluation', 'custom_metrics', 'episode_media', 'num_recreated_workers', 'info', 'sampler_results', 'episode_reward_max', 'episode_reward_min', 'episode_reward_mean', 'episode_len_mean', 'episodes_this_iter', 'policy_reward_min', 'policy_reward_max', 'policy_reward_mean', 'hist_stats', 'sampler_perf', 'num_faulty_episodes', 'num_healthy_workers', 'num_agent_steps_sampled', 'num_agent_steps_trained', 'num_env_steps_sampled', 'num_env_steps_trained', 'num_env_steps_sampled_this_iter', 'num_env_steps_trained_this_iter', 'timesteps_total', 'num_steps_trained_this_iter', 'agent_timesteps_total', 'timers', 'counters', 'done', 'episodes_total', 'training_iteration', 'trial_id', 'experiment_id', 'date', 'timestamp', 'time_this_iter_s', 'time_total_s', 'pid', 'hostname', 'node_ip', 'config', 'time_since_restore', 'timesteps_since_restore', 'iterations_since_restore', 'warmup_time', 'perf', 'experiment_tag']
I saw that this behavior had been previously reported (~~#14374~~ #14377) and resolved (~~#14375~~ #14379) but it reoccurred. Apparently, this line didn't reflect the mentioned pull request somehow.
Versions / Dependencies
ray[rllib]
ray==2.0.0rc0
Python 3.8.10
Tested on headless server and used virtual display (probably irrelevant)
Reproduction script
I think the script in the old issue is still valid but I tested using this similar script:
import ray
from ray import tune
from pyvirtualdisplay import Display
if __name__ == "__main__":
ray.init()
config = {
"env": "CartPole-v1",
"framework": "torch",
"timesteps_per_iteration": 10,
"evaluation_interval": 1,
"evaluation_num_episodes": 1,
}
with Display(visible=False, size=(1400, 900)) as disp:
analysis = tune.run(
"DQN",
stop={"num_env_steps_trained": 2000},
config=config,
num_samples=1,
checkpoint_freq=1,
keep_checkpoints_num=1,
checkpoint_score_attr="evaluation/episode_reward_mean"
)
ray.shutdown()
Issue Severity
High: It blocks me from completing my task.
Thanks for reporting this @Juno-T ! This is indeed a regression introduced by PR. Putting up a fix and a test now.
Thanks for your help on this, @xwjiang2010 ! :)