Ashish Rao
Ashish Rao
Hello, I am trying to save video files of runs while training with the following command: `python -m baselines.run --alg=ppo2 --env=InvertedPendulum-v2 --num_timesteps=1e5 --save_video_interval=33333 ` However, I get the error: `ERROR:...
📝 Summary of Changes This PR adds three optimizations to the `CrossHost{Send,Receive}Buffers` methods in `StreamExecutorGpuClient`: 1. Communicators are cached for re-use across transfers with `AcquireGpuClique`. 2. Transfers of multiple arrays...
**Adds parameters `min_slice_bytes_for_replica_parallel` and `max_replicas_for_replica_parallel` to `ArrayHandler` to allow users to configure replica-parallel checkpoint saving.** If `use_replica_parallel` is set to true, saving will be parallelized over at most `max_replicas_for_replica_parallel` different...
This PR modifies `_batched_device_put_impl` to batch the cross-host data transfers of multiple arrays. This enables us to take advantage of recent optimizations inside XLA for cross-host data transfers, particularly on...