ray icon indicating copy to clipboard operation
ray copied to clipboard

[AIR output] "iteration" is shown in the output for RL users

Open scottsun94 opened this issue 2 years ago • 7 comments

What happened + What you expected to happen

I ran the learning_tests_impala_torch with the new air output. It seems that we show "iteration" in the output. Not sure if it's a good thing or not because users may not be familiar with this "iteration" concept.

We plan to show something like Finished 1000 timesteps [359 timesteps/s] at 2023-02-24 12:35:39. Running time: 2min 14s in the original design.

Training finished iter 13 at 2023-03-29 09:08:13 (running for 00:03:11.52)
agent_timesteps_total: 389000
connector_metrics: {}
counters:
  num_agent_steps_sampled: 389000
  num_agent_steps_trained: 388500
  num_env_steps_sampled: 389000
  num_env_steps_trained: 388500
  num_samples_added_to_queue: 389000
  num_training_step_calls_since_last_synch_worker_weights: 134
  num_weight_broadcasts: 964
custom_metrics: {}
episode_len_mean: 1721.88
episode_media: {}
episode_reward_max: 36.0
episode_reward_mean: 9.77
episode_reward_min: 4.0
episodes_this_iter: 67
episodes_total: 1710
info:
  learner:
    default_policy:
      custom_metrics: {}
      diff_num_grad_updates_vs_sampler_policy: 10.0
      learner_stats:
        cur_lr: 0.0005
        entropy: 1.0906739234924316
        entropy_coeff: 0.01
        policy_loss: -32.83232116699219
        total_loss: -28.335006713867188
        var_gnorm: 16.424108505249023
        vf_explained_var: 0.6170328259468079
        vf_loss: 19.683231353759766
      model: {}
      num_grad_updates_lifetime: 777.0
  learner_queue:
    size_count: 778
    size_mean: 0.0
    size_quantiles: [0.0, 0.0, 0.0, 0.0, 0.0]
    size_std: 0.0
  num_agent_steps_sampled: 389000
  num_agent_steps_trained: 388500
  num_env_steps_sampled: 389000
  num_env_steps_trained: 388500
  num_samples_added_to_queue: 389000
  num_training_step_calls_since_last_synch_worker_weights: 134
  num_weight_broadcasts: 964
  timing_breakdown:
    learner_dequeue_time_ms: 2772.957
    learner_grad_time_ms: 123.634
    learner_load_time_ms: 4.319
    learner_load_wait_time_ms: 47.829
num_agent_steps_sampled: 389000
num_agent_steps_trained: 388500
num_env_steps_sampled: 389000
num_env_steps_sampled_this_iter: 30750
num_env_steps_trained: 388500
num_env_steps_trained_this_iter: 31000
num_faulty_episodes: 0
num_healthy_workers: 10
num_in_flight_async_reqs: 20
num_remote_worker_restarts: 0
num_steps_trained_this_iter: 31000
perf:
  cpu_util_percent: 34.94117647058823
  ram_util_percent: 5.211764705882353
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
  mean_action_processing_ms: 0.6395067034460499
  mean_env_render_ms: 0.0
  mean_env_wait_ms: 7.840870172264184
  mean_inference_ms: 6.614064184668028
  mean_raw_obs_processing_ms: 2.9528597540097277
sampler_results:
  connector_metrics: {}
  custom_metrics: {}
  episode_len_mean: 1721.88
  episode_media: {}
  episode_reward_max: 36.0
  episode_reward_mean: 9.77
  episode_reward_min: 4.0
  episodes_this_iter: 67
  hist_stats:
    episode_lengths: [1414, 1306, 1641, 1446, 1234, 2026, 1600, 2454, 1359, 1572,
      1411, 1471, 1463, 1269, 1347, 1083, 2344, 1095, 1956, 1603, 1255, 2218, 1208,
      1943, 1483, 1158, 2108, 1073, 1535, 2590, 1804, 1802, 2109, 1783, 1099, 1258,
      1211, 1826, 2480, 1977, 1649, 1159, 1598, 1972, 2280, 2026, 1732, 1167, 1884,
      1599, 1722, 2156, 1723, 1767, 1387, 1849, 2061, 2356, 1875, 1727, 2524, 1620,
      1926, 1507, 1902, 1999, 1914, 1514, 1699, 1095, 2081, 1632, 1520, 1578, 2329,
      985, 1681, 1719, 1836, 1306, 2122, 1726, 1804, 2020, 2076, 1235, 1074, 1970,
      1853, 1836, 1228, 1431, 2112, 1946, 2793, 1822, 2044, 1946, 2200, 1880]
    episode_reward: [9.0, 12.0, 7.0, 7.0, 5.0, 11.0, 7.0, 17.0, 10.0, 11.0, 9.0, 6.0,
      6.0, 6.0, 9.0, 4.0, 13.0, 11.0, 12.0, 7.0, 5.0, 10.0, 5.0, 10.0, 14.0, 4.0,
      10.0, 4.0, 10.0, 15.0, 7.0, 11.0, 9.0, 8.0, 11.0, 5.0, 8.0, 16.0, 13.0, 8.0,
      6.0, 12.0, 6.0, 9.0, 11.0, 13.0, 7.0, 4.0, 11.0, 7.0, 10.0, 15.0, 7.0, 15.0,
      6.0, 8.0, 10.0, 36.0, 8.0, 8.0, 14.0, 9.0, 11.0, 13.0, 15.0, 9.0, 8.0, 5.0,
      8.0, 11.0, 14.0, 11.0, 6.0, 14.0, 20.0, 4.0, 11.0, 8.0, 8.0, 12.0, 14.0, 10.0,
      10.0, 11.0, 10.0, 4.0, 4.0, 9.0, 8.0, 8.0, 4.0, 5.0, 10.0, 11.0, 20.0, 13.0,
      14.0, 8.0, 13.0, 9.0]
  num_faulty_episodes: 0
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.6395067034460499
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 7.840870172264184
    mean_inference_ms: 6.614064184668028
    mean_raw_obs_processing_ms: 2.9528597540097277
time_this_iter_s: 11.578344583511353
time_total_s: 149.75555968284607
timers:
  sample_time_ms: 0.242
  synch_weights_time_ms: 0.027
  training_iteration_time_ms: 0.354
timesteps_total: 389000
training_iteration: 13

Versions / Dependencies

nightly

Reproduction script

learning_tests_impala_torch

Issue Severity

Low: It annoys or frustrates me.

scottsun94 avatar May 01 '23 18:05 scottsun94

@kouroshHakha @gjoliver Do you think this is a p1 issue? (fix it before we expose the new design to users by default in 2.5)

scottsun94 avatar May 01 '23 18:05 scottsun94

I think we should keep this in as its an important (and default) metric for schedulers and checkpoint management.

krfricke avatar May 02 '23 15:05 krfricke

Training finished iter 13 at 2023-03-29 09:08:13 (running for 00:03:11.52)

@krfricke Actually, I'm referring to the sentence before the reported results.

In the original design, we plan to use timesteps instead of iteration, something like this

Training finished 1000 timesteps [359 timesteps/s] at 2023-02-24 12:35:39. Running time: 2min 14s

scottsun94 avatar May 02 '23 16:05 scottsun94

Ah I see.

I may try to tackle this after https://github.com/ray-project/ray/pull/34951 is merged.

krfricke avatar May 03 '23 14:05 krfricke

Actually, if it's ok, I'd like to punt this for later. We're basically targeting an rllib-specific progress reporter here, and it's not easy to shoehorn the functionality in without introducing a more advanced context management. I'm pretty sure we'll do this (see also discussion in https://github.com/ray-project/ray/pull/35003) but until this is done, let's deprioritize this. Ok?

cc @sven1977 @kouroshHakha

krfricke avatar May 04 '23 13:05 krfricke

SGTM

scottsun94 avatar May 04 '23 15:05 scottsun94

This P2 issue has seen no activity in the past 2 years. It will be closed in 2 weeks as part of ongoing cleanup efforts.

Please comment and remove the pending-cleanup label if you believe this issue should remain open.

Thanks for contributing to Ray!

cszhu avatar Jun 17 '25 00:06 cszhu