rlpyt icon indicating copy to clipboard operation
rlpyt copied to clipboard

Reinforcement Learning in PyTorch

Results 64 rlpyt issues
Sort by recently updated
recently updated
newest added

Hi, I'm trying to reproduce the work you did in "Decoupling Representation Learning from Reinforcement Learning". Can you kindly provide /data/adam/ul4rl/replays/20200608/15M_VecEps_B78/pong/run_0/replaybuffer.pkl so I can do so. Thanks Harry

- Add dqn for non-frame state spaces - Add loading and evaluation code for the trained model

This adds support for the V-MPO algorithm which was published at ICLR 2020 by DeepMind (https://openreview.net/forum?id=SylOlp4FvH). As far as I know, this is the first public implementation. Unfortunately, my computational...

Hello, When training an LSTM PPO agent, I was wondering whether there is a way to sample multiple batches of length batch_T in between PPO updates _i.e._, batch_B > 1,...

The `workers_cpu` argument requires the user to know the cpu affinity of the process before it is started. It is possible that the manually assigned cpu is not in the...

Super minor, but the URL in the paper "Decoupling Representation Learning from Reinforcement Learning" is [https://github.com/astooke/rlpyt/rlpyt/ul](https://github.com/astooke/rlpyt/rlpyt/ul), which leads to "page not found". What works though is [https://github.com/astooke/rlpyt/**tree/master**/rlpyt/ul](https://github.com/astooke/rlpyt/tree/master/rlpyt/ul). P.S. Congrats on...

Like this nice work of rl codebase. It's nice and pretty! I wonder if there is any evaluation examples? I mean we have `build and train`, but do we have...

I think it's more useful if the step in tensorboard corresponds to the total number of environment steps. This makes it easier to compare algorithms with different replay ratios.

Averaged results over 10 runs for PPO on Walker2d-v3: ![walker2dv3normtest](https://user-images.githubusercontent.com/10367284/79826905-ca6dbb00-8351-11ea-8a24-efcafad53fa7.png)

I'm using a SerialSampler with the default collector, and on some training iterations I'm getting 0 new completed trajectories along with 0 StepsInTrajWindow. Additionally, there simply isn't a DiscountedReturn line...