[Tune] PB2 Checkpoint/Sync Path Compatibility with Windows

Open norikazu99 opened this issue 2 years ago • 0 comments

What happened + What you expected to happen

Hello, I have a Windows machine and it seems that there are warnings regarding the sync/checkpoint path due to windows paths using "" (backslash) instead of "/". PB2 is able to load the chekpoints. However there are multiple warnings and an error with regards to the checkpoint or sync paths, which indicate they might be slowing down the training process.

(PPO pid=14200) 2023-06-13 14:51:09,810 ERROR trainable.py:671 -- Could not upload checkpoint to c://\Users\badrr\ray_results\PPO\PPO_OneCoinDiscrete_d110b_00003_3_clip_param=0.0000,entropy_coeff=0.0005,final_lr_div_for_fn=0.2576,final_lr_step_for_fn=1.5002,ga_2023-06-13_14-33-01 even after 3 retries.Please check if the credentials expired and that the remote filesystem is supported. For large checkpoints or artifacts, consider increasing SyncConfig(sync_timeout) (current value: 1800 seconds).

2023-06-13 14:53:20,547 WARNING tune.py:1085 -- Trial Runner checkpointing failed: Sync process failed: GetFileInfo() yielded path 'C:/Users/badrr/ray_results/PPO/PPO_OneCoinDiscrete_ad7f0_00000_0_clip_param=0.1000,entropy_coeff=0.0012,final_lr_div_for_fn=3.3564,final_lr_step_for_fn=0.3162,ga_2023-06-13_13-49-04', which is outside base dir 'C:\Users\badrr\ray_results\PPO'

However, as you can see the PB2 schedule is able to load the checkpoints:

(PPO pid=6336) 2023-06-13 14:52:42,836 INFO trainable.py:918 -- Restored on 127.0.0.1 from checkpoint: C:\Users\badrr\AppData\Local\Temp\checkpoint_tmp_e147a4feacba4d0dac0c131a89da09ab (PPO pid=6336) 2023-06-13 14:52:42,837 INFO trainable.py:927 -- Current state after restoring: {'_iteration': 8, '_timesteps_total': None, '_time_total': 402.88116931915283, '_episodes_total': 20153}

Versions / Dependencies

Os = Windows 11 ray["all"]=2.4.0

Reproduction script


import ray
from ray import tune
from ray.tune.schedulers.pb2 import PB2
from ray.rllib.algorithms.ppo import PPOConfig



ray.init()
config = PPOConfig().environment("CartPole-v1").framework("torch")\
                    .training(gamma=tune.uniform(0.9, 0.999), lr=tune.uniform(1e-4, 1e-3),_enable_learner_api=True).rl_module(_enable_rl_module_api=True)\
                    .rollouts(num_rollout_workers=1, num_envs_per_worker=1, create_env_on_local_worker=False)\
                    .resources(num_cpus_for_local_worker=0, num_cpus_per_worker=2, num_gpus_per_learner_worker=0.25, num_cpus_per_learner_worker=0)\


pb2_scheduler = PB2(
    time_attr='training_iteration', 
    metric= 'episode_reward_mean',
    mode='max',
    perturbation_interval=2,
    hyperparam_bounds={
        "lr": [1e-4, 1e-3],
        "gamma": [0.9, 0.999],
    },
    quantile_fraction= 0.5,
)

tune.run("PPO", scheduler=pb2_scheduler, num_samples=4, config=config, stop={"training_iteration":5})

Issue Severity

Low: It annoys or frustrates me.

Jun 13 '23 14:06 norikazu99