[Tune] PB2 Checkpoint/Sync Path Compatibility with Windows
What happened + What you expected to happen
Hello, I have a Windows machine and it seems that there are warnings regarding the sync/checkpoint path due to windows paths using "" (backslash) instead of "/". PB2 is able to load the chekpoints. However there are multiple warnings and an error with regards to the checkpoint or sync paths, which indicate they might be slowing down the training process.
(PPO pid=14200) 2023-06-13 14:51:09,810 ERROR trainable.py:671 -- Could not upload checkpoint to c://\Users\badrr\ray_results\PPO\PPO_OneCoinDiscrete_d110b_00003_3_clip_param=0.0000,entropy_coeff=0.0005,final_lr_div_for_fn=0.2576,final_lr_step_for_fn=1.5002,ga_2023-06-13_14-33-01 even after 3 retries.Please check if the credentials expired and that the remote filesystem is supported. For large checkpoints or artifacts, consider increasing
SyncConfig(sync_timeout)(current value: 1800 seconds).
2023-06-13 14:53:20,547 WARNING tune.py:1085 -- Trial Runner checkpointing failed: Sync process failed: GetFileInfo() yielded path 'C:/Users/badrr/ray_results/PPO/PPO_OneCoinDiscrete_ad7f0_00000_0_clip_param=0.1000,entropy_coeff=0.0012,final_lr_div_for_fn=3.3564,final_lr_step_for_fn=0.3162,ga_2023-06-13_13-49-04', which is outside base dir 'C:\Users\badrr\ray_results\PPO'
However, as you can see the PB2 schedule is able to load the checkpoints:
(PPO pid=6336) 2023-06-13 14:52:42,836 INFO trainable.py:918 -- Restored on 127.0.0.1 from checkpoint: C:\Users\badrr\AppData\Local\Temp\checkpoint_tmp_e147a4feacba4d0dac0c131a89da09ab (PPO pid=6336) 2023-06-13 14:52:42,837 INFO trainable.py:927 -- Current state after restoring: {'_iteration': 8, '_timesteps_total': None, '_time_total': 402.88116931915283, '_episodes_total': 20153}
Versions / Dependencies
Os = Windows 11 ray["all"]=2.4.0
Reproduction script
import ray
from ray import tune
from ray.tune.schedulers.pb2 import PB2
from ray.rllib.algorithms.ppo import PPOConfig
ray.init()
config = PPOConfig().environment("CartPole-v1").framework("torch")\
.training(gamma=tune.uniform(0.9, 0.999), lr=tune.uniform(1e-4, 1e-3),_enable_learner_api=True).rl_module(_enable_rl_module_api=True)\
.rollouts(num_rollout_workers=1, num_envs_per_worker=1, create_env_on_local_worker=False)\
.resources(num_cpus_for_local_worker=0, num_cpus_per_worker=2, num_gpus_per_learner_worker=0.25, num_cpus_per_learner_worker=0)\
pb2_scheduler = PB2(
time_attr='training_iteration',
metric= 'episode_reward_mean',
mode='max',
perturbation_interval=2,
hyperparam_bounds={
"lr": [1e-4, 1e-3],
"gamma": [0.9, 0.999],
},
quantile_fraction= 0.5,
)
tune.run("PPO", scheduler=pb2_scheduler, num_samples=4, config=config, stop={"training_iteration":5})
Issue Severity
Low: It annoys or frustrates me.