simonsays1980
simonsays1980
## Why are these changes needed? This PR adds prioritized sampling for multi-agent setups. It implements `"independent"` sampling by holding sum- and min-segments for each module and updates them accordingly...
## Why are these changes needed? The user defines the window size for metrics in `metrics_num_episodes_for_smoothing`. This needs to be applied to all episode metrics to keep consistency. This PR...
## Why are these changes needed When sampling complete episodes each `EnvRunner` sampled `train_batch_size` before returning. This made sampling inefficient and led to long waiting times in case slow environments...
### What happened + What you expected to happen # What happened I ran the TensorFlow PPO algorithm on a problem that gave me some instable gradients. I wanted to...
### What happened + What you expected to happen # What happened I ran the script below to read in data in zipped JSONL format and ran into this error:...
## Why are these changes needed? So far, output/write arguments offered the users already to define cloud filesystems (like GCS, S3, ABS) to write to. This PR proposes the same...
## Why are these changes needed? Right now the new Offline RL stack does not allow using old stack record data. Many users have costly recorded data from the old...
## Why are these changes needed? Storing with `ray.data` episodes as instances results in pickled instances that are maybe not compatible with later python versions. This PR tries to develop...
## Why are these changes needed? At the moment the `OfflinePreLearner` samples recorded episodes or `SampleBatch`es from a `ray.data` dataset and then adds them to a buffer which corrdinates the...
## Why are these changes needed? The autoregressive-actions example was flaky (see #47876) and could be simplified (as PPO only backpropagates through the log-probabilities. This PR suggests a simplified solution...