pytorch-a3c icon indicating copy to clipboard operation
pytorch-a3c copied to clipboard

PyTorch implementation of Asynchronous Advantage Actor Critic (A3C) from "Asynchronous Methods for Deep Reinforcement Learning".

Results 24 pytorch-a3c issues
Sort by recently updated
recently updated
newest added

Fixes issue #66 ( env wrappers need to implement observation() )

As you mentioned that A2C is strongly suggested except for specific reason. So if I need to run it in distributed processing, (actually only for collecting data in real time),...

The `while True:` of https://github.com/ikostrikov/pytorch-a3c/blob/master/train.py#L35 cannot be break, because the only `break` statement is in https://github.com/ikostrikov/pytorch-a3c/blob/master/train.py#L79 which is used to break for-loop: https://github.com/ikostrikov/pytorch-a3c/blob/master/train.py#L50 How to terminate that forever while-loop in...

Hi, In the [original paper](https://arxiv.org/pdf/1602.01783.pdf), it mentions that it uses multi threads. But I see in your code, you are using multi-process. As far as I know, these two methods...

I'm sorry to ask a simple question. I don't know the difference between the 'Pong-v4' and 'PongDeterministic-v4'. And why you use the latter environment to test your algorithm instead of...

I am kind of confused of the ensure_shared_grads here https://github.com/ikostrikov/pytorch-a3c/blob/master/train.py#L13. Here, the `grad` is synced only when it is `None`. I think we need to set `shared_param._grad = param.grad` all...

Very sorry to ask you a simple question, thanks a lot.

Hi I make some small changes to clear all warnings corresponds to torch and gym old versions. I also add tensorboard to tester agent in order to monitor learning process...

Hi,Today, i run the code, and found that when no-shared=False, the process will be blocked. Do you have any suggesstions to fix that? THANKS!

Hi, added simple logging with tensorboard logger. (no dependencies on tensorflow) If you want to keep it simple and minimal it's ok to reject :) training time here is around...