[AIR output] "iteration" is shown in the output for RL users
What happened + What you expected to happen
I ran the learning_tests_impala_torch with the new air output. It seems that we show "iteration" in the output. Not sure if it's a good thing or not because users may not be familiar with this "iteration" concept.
We plan to show something like Finished 1000 timesteps [359 timesteps/s] at 2023-02-24 12:35:39. Running time: 2min 14s in the original design.
Training finished iter 13 at 2023-03-29 09:08:13 (running for 00:03:11.52)
agent_timesteps_total: 389000
connector_metrics: {}
counters:
num_agent_steps_sampled: 389000
num_agent_steps_trained: 388500
num_env_steps_sampled: 389000
num_env_steps_trained: 388500
num_samples_added_to_queue: 389000
num_training_step_calls_since_last_synch_worker_weights: 134
num_weight_broadcasts: 964
custom_metrics: {}
episode_len_mean: 1721.88
episode_media: {}
episode_reward_max: 36.0
episode_reward_mean: 9.77
episode_reward_min: 4.0
episodes_this_iter: 67
episodes_total: 1710
info:
learner:
default_policy:
custom_metrics: {}
diff_num_grad_updates_vs_sampler_policy: 10.0
learner_stats:
cur_lr: 0.0005
entropy: 1.0906739234924316
entropy_coeff: 0.01
policy_loss: -32.83232116699219
total_loss: -28.335006713867188
var_gnorm: 16.424108505249023
vf_explained_var: 0.6170328259468079
vf_loss: 19.683231353759766
model: {}
num_grad_updates_lifetime: 777.0
learner_queue:
size_count: 778
size_mean: 0.0
size_quantiles: [0.0, 0.0, 0.0, 0.0, 0.0]
size_std: 0.0
num_agent_steps_sampled: 389000
num_agent_steps_trained: 388500
num_env_steps_sampled: 389000
num_env_steps_trained: 388500
num_samples_added_to_queue: 389000
num_training_step_calls_since_last_synch_worker_weights: 134
num_weight_broadcasts: 964
timing_breakdown:
learner_dequeue_time_ms: 2772.957
learner_grad_time_ms: 123.634
learner_load_time_ms: 4.319
learner_load_wait_time_ms: 47.829
num_agent_steps_sampled: 389000
num_agent_steps_trained: 388500
num_env_steps_sampled: 389000
num_env_steps_sampled_this_iter: 30750
num_env_steps_trained: 388500
num_env_steps_trained_this_iter: 31000
num_faulty_episodes: 0
num_healthy_workers: 10
num_in_flight_async_reqs: 20
num_remote_worker_restarts: 0
num_steps_trained_this_iter: 31000
perf:
cpu_util_percent: 34.94117647058823
ram_util_percent: 5.211764705882353
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.6395067034460499
mean_env_render_ms: 0.0
mean_env_wait_ms: 7.840870172264184
mean_inference_ms: 6.614064184668028
mean_raw_obs_processing_ms: 2.9528597540097277
sampler_results:
connector_metrics: {}
custom_metrics: {}
episode_len_mean: 1721.88
episode_media: {}
episode_reward_max: 36.0
episode_reward_mean: 9.77
episode_reward_min: 4.0
episodes_this_iter: 67
hist_stats:
episode_lengths: [1414, 1306, 1641, 1446, 1234, 2026, 1600, 2454, 1359, 1572,
1411, 1471, 1463, 1269, 1347, 1083, 2344, 1095, 1956, 1603, 1255, 2218, 1208,
1943, 1483, 1158, 2108, 1073, 1535, 2590, 1804, 1802, 2109, 1783, 1099, 1258,
1211, 1826, 2480, 1977, 1649, 1159, 1598, 1972, 2280, 2026, 1732, 1167, 1884,
1599, 1722, 2156, 1723, 1767, 1387, 1849, 2061, 2356, 1875, 1727, 2524, 1620,
1926, 1507, 1902, 1999, 1914, 1514, 1699, 1095, 2081, 1632, 1520, 1578, 2329,
985, 1681, 1719, 1836, 1306, 2122, 1726, 1804, 2020, 2076, 1235, 1074, 1970,
1853, 1836, 1228, 1431, 2112, 1946, 2793, 1822, 2044, 1946, 2200, 1880]
episode_reward: [9.0, 12.0, 7.0, 7.0, 5.0, 11.0, 7.0, 17.0, 10.0, 11.0, 9.0, 6.0,
6.0, 6.0, 9.0, 4.0, 13.0, 11.0, 12.0, 7.0, 5.0, 10.0, 5.0, 10.0, 14.0, 4.0,
10.0, 4.0, 10.0, 15.0, 7.0, 11.0, 9.0, 8.0, 11.0, 5.0, 8.0, 16.0, 13.0, 8.0,
6.0, 12.0, 6.0, 9.0, 11.0, 13.0, 7.0, 4.0, 11.0, 7.0, 10.0, 15.0, 7.0, 15.0,
6.0, 8.0, 10.0, 36.0, 8.0, 8.0, 14.0, 9.0, 11.0, 13.0, 15.0, 9.0, 8.0, 5.0,
8.0, 11.0, 14.0, 11.0, 6.0, 14.0, 20.0, 4.0, 11.0, 8.0, 8.0, 12.0, 14.0, 10.0,
10.0, 11.0, 10.0, 4.0, 4.0, 9.0, 8.0, 8.0, 4.0, 5.0, 10.0, 11.0, 20.0, 13.0,
14.0, 8.0, 13.0, 9.0]
num_faulty_episodes: 0
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.6395067034460499
mean_env_render_ms: 0.0
mean_env_wait_ms: 7.840870172264184
mean_inference_ms: 6.614064184668028
mean_raw_obs_processing_ms: 2.9528597540097277
time_this_iter_s: 11.578344583511353
time_total_s: 149.75555968284607
timers:
sample_time_ms: 0.242
synch_weights_time_ms: 0.027
training_iteration_time_ms: 0.354
timesteps_total: 389000
training_iteration: 13
Versions / Dependencies
nightly
Reproduction script
learning_tests_impala_torch
Issue Severity
Low: It annoys or frustrates me.
@kouroshHakha @gjoliver Do you think this is a p1 issue? (fix it before we expose the new design to users by default in 2.5)
I think we should keep this in as its an important (and default) metric for schedulers and checkpoint management.
Training finished iter 13 at 2023-03-29 09:08:13 (running for 00:03:11.52)
@krfricke Actually, I'm referring to the sentence before the reported results.
In the original design, we plan to use timesteps instead of iteration, something like this
Training finished 1000 timesteps [359 timesteps/s] at 2023-02-24 12:35:39. Running time: 2min 14s
Ah I see.
I may try to tackle this after https://github.com/ray-project/ray/pull/34951 is merged.
Actually, if it's ok, I'd like to punt this for later. We're basically targeting an rllib-specific progress reporter here, and it's not easy to shoehorn the functionality in without introducing a more advanced context management. I'm pretty sure we'll do this (see also discussion in https://github.com/ray-project/ray/pull/35003) but until this is done, let's deprioritize this. Ok?
cc @sven1977 @kouroshHakha
SGTM
This P2 issue has seen no activity in the past 2 years. It will be closed in 2 weeks as part of ongoing cleanup efforts.
Please comment and remove the pending-cleanup label if you believe this issue should remain open.
Thanks for contributing to Ray!