rlpyt issues

Replay data - 15M_VecEps_B78

1

Hi, I'm trying to reproduce the work you did in "Decoupling Representation Learning from Reinforcement Learning". Can you kindly provide /data/adam/ul4rl/replays/20200608/15M_VecEps_B78/pong/run_0/replaybuffer.pkl so I can do so. Thanks Harry

indrasweb

Dqn nonframe state

4

- Add dqn for non-frame state spaces - Add loading and evaluation code for the trained model

kargarisaac

Add V-MPO

This adds support for the V-MPO algorithm which was published at ICLR 2020 by DeepMind (https://openreview.net/forum?id=SylOlp4FvH). As far as I know, this is the first public implementation. Unfortunately, my computational...

AlexanderKoch-Koch

[LSTM PPO] how to increase batch_B without creating multiple environment objects?

Hello, When training an LSTM PPO agent, I was wondering whether there is a way to sample multiple batches of length batch_T in between PPO updates _i.e._, batch_B > 1,...

andrewsonga

Error in cpu affinity assignment

The `workers_cpu` argument requires the user to know the cpu affinity of the process before it is started. It is possible that the manually assigned cpu is not in the...

backpropper

Incorrect URL in the UL arXiv paper

Super minor, but the URL in the paper "Decoupling Representation Learning from Reinforcement Learning" is [https://github.com/astooke/rlpyt/rlpyt/ul](https://github.com/astooke/rlpyt/rlpyt/ul), which leads to "page not found". What works though is [https://github.com/astooke/rlpyt/**tree/master**/rlpyt/ul](https://github.com/astooke/rlpyt/tree/master/rlpyt/ul). P.S. Congrats on...

qbilius

Testing code examples

2

Like this nice work of rl codebase. It's nice and pretty! I wonder if there is any evaluation examples? I mean we have `build and train`, but do we have...

yd-yin

use cumulative environment steps as step in tensorboard

3

I think it's more useful if the step in tensorboard corresponds to the total number of environment steps. This makes it easier to compare algorithms with different replay ratios.

AlexanderKoch-Koch

Normalize rewards by standard deviation of discounted return in MuJoCo

4

Averaged results over 10 runs for PPO on Walker2d-v3: ![walker2dv3normtest](https://user-images.githubusercontent.com/10367284/79826905-ca6dbb00-8351-11ea-8a24-efcafad53fa7.png)

vzhuang

Diagnostics/NewCompletedTrajs 0 on some iterations

1

I'm using a SerialSampler with the default collector, and on some training iterations I'm getting 0 new completed trajectories along with 0 StepsInTrajWindow. Additionally, there simply isn't a DiscountedReturn line...

jordan-schneider

rlpyt
rlpyt copied to clipboard

Metadata

Replay data - 15M_VecEps_B78

Dqn nonframe state

Add V-MPO

[LSTM PPO] how to increase batch_B without creating multiple environment objects?

Error in cpu affinity assignment

Incorrect URL in the UL arXiv paper

Testing code examples

use cumulative environment steps as step in tensorboard

Normalize rewards by standard deviation of discounted return in MuJoCo

Diagnostics/NewCompletedTrajs 0 on some iterations

← Metadata

Owner

Metadata

rlpyt rlpyt copied to clipboard

Metadata

← Metadata

Owner

Metadata

rlpyt
rlpyt copied to clipboard