rlpyt issues

ReturnAverage and NonzeroRewardsAverage in logs become nan after a period of time.

Hi. I am new to rlpyt and I ran the dqn_async_gpu example. I looked at the logs at /data/local and I was trying to find the episode rewards. I suspect...

bryanyuan1

add Hindsight Experience Replay to replay buffers

13

Since HER applies to any off policy algorithm, I think this would be useful for researchers studying sparse reward problems. I can take a crack at this and submit a...

MishaLaskin

Error on running GpuSampler/CpuSampler

4

My program fails with the following error when I try to use GpuSampler or CpuSampler (SerialSampler works normally) XIO: fatal IO error 11 (Resource temporarily unavailable) on X server "localhost:11.0"...

nazarblch

example_1 DQN won't learn Pong

I'm trying to run the examples as a sanity check but they don't seem to be learning. I tried examples 1 and 3. The only change I made to example...

TheExGenesis

ctrl.barrier_in.wait() waiting issue

now I use the multi-thread. and when my training code is terminated, but it does not kill pid. how can solve this?

sungreong

How to get replays for training UL?

2

Hi, In the [rlpyt/rlpyt/ul](https://github.com/astooke/rlpyt/tree/master/rlpyt/ul) directory, description says: "See "experiments" folder for scripts to run online RL agents with ATC ... and **other RL agents to gather expert demonstrations**. However, I...

kevinghst

definitelyuncertain

Why is .item() not called on grad norm like on other opt info fields?

1

In, e.g., PPO (though this also applies to at least A2C; I didn't check any others), when OptInfo is being populated, `.item` is called on most of the fields but...

neighthan

rlpyt
rlpyt copied to clipboard

Metadata

ReturnAverage and NonzeroRewardsAverage in logs become nan after a period of time.

add Hindsight Experience Replay to replay buffers

Error on running GpuSampler/CpuSampler

example_1 DQN won't learn Pong

ctrl.barrier_in.wait() waiting issue

How to get replays for training UL?

A question about CategoricalPgAgent

Reproducing figure 4 results from Decoupling Representation Learning from RL paper for atari games

Asynchronous runners with CPU only?

Why is .item() not called on grad norm like on other opt info fields?

← Metadata

Owner

Metadata

rlpyt rlpyt copied to clipboard

Metadata

← Metadata

Owner

Metadata

rlpyt
rlpyt copied to clipboard