pfrl issues

Discrepancy in SAC on entropy coefficient update

2

Noticed that [here ](https://github.com/pfnet/pfrl/blob/master/pfrl/agents/soft_actor_critic.py#L281) the `log_prob` variable is computed before the udpate of the actor while on SAC's [repo ](https://github.com/rail-berkeley/softlearning/blob/master/softlearning/algorithms/sac.py#L246) it is recomputed after the actor update (the [paper](https://arxiv.org/abs/1812.05905) also...

marioyc

Incompatible with Gym v26

Gym v26 makes a number of changing to the core API, `reset` and `step` in particular, along with `render` and `seed`. Could you either pin the gym version to `

pseudo-rnd-thoughts

"from gym.wrappers import Monitor" has been deprecated

1

The Monitor from gym.wrappers has been deprecated. At the moment, you can;t even "import pflr", as immediately you are greeted with this: ImportError: cannot import name 'Monitor' from 'gym.wrappers'

ThePeshMod

RuntimeError: Size does not match at dimension 0 expected index [32, 1] to be smaller than self [1, 3] apart from dimension 1

1

Hi, I'm trying to setup a DQN agent with a graph attention layer. The agent can take one of 3 actions. For some reason, when I run the training function,...

behradkoohy

Snapshot for preemption

8

Current pfrl does not support snapshot of training, which is important in many job systems such as Kubernetes. This PR support saving and loading snapshot including replay buffer. ## Done...

knshnb

Adds Clearing of Replay Buffer

9

prabhatnagarajan

Batch and async training do not work with macOS/Windows and Python >= 3.8

2

In Python 3.8 the default mode of multiprocessing for macOS was changed For reference: https://github.com/chainer/chainerrl/issues/572

marioyc

Hindsight Experience Replay

5

Hindsight Experience Replay with bit-flipping example: https://arxiv.org/abs/1707.01495

prabhatnagarajan

Hindsight Experience Replay Buffer

22

Depends on https://github.com/pfnet/pfrl/pull/80. Resolves https://github.com/pfnet/pfrl/issues/6. Results: ![her_bit_flip_dqn](https://user-images.githubusercontent.com/10005453/97736859-a0c2f700-1b1f-11eb-85ce-5fcf8e69d4dd.png)

prabhatnagarajan

MultiDiscrete action spaces

10

I have a custom environment with a [MultiDiscrete](https://github.com/openai/gym/blob/master/gym/spaces/multi_discrete.py) action space. The MultiDiscrete action space allows controlling an agent with n-dimensional discrete action spaces. In my environment, I have 4 dimensions...

tkelestemur

pfrl
pfrl copied to clipboard

Metadata

Discrepancy in SAC on entropy coefficient update

Incompatible with Gym v26

"from gym.wrappers import Monitor" has been deprecated

RuntimeError: Size does not match at dimension 0 expected index [32, 1] to be smaller than self [1, 3] apart from dimension 1

Snapshot for preemption

Adds Clearing of Replay Buffer

Batch and async training do not work with macOS/Windows and Python >= 3.8

Hindsight Experience Replay

Hindsight Experience Replay Buffer

MultiDiscrete action spaces

← Metadata

Owner

Metadata

pfrl pfrl copied to clipboard

Metadata

← Metadata

Owner

Metadata

pfrl
pfrl copied to clipboard