AI-blog issues

Playing Breakout with the program

2

I am trying to adapt your code to train the agent to play breakout. I tried to use both the CartPole-basic file as well as the Seaquest-DDQN-PER file but the...

hamiltonwei

A couple of questions

First, what are EPS_START, EPS_STOP, and EPS_STEPS? If I want episodes to last until the game naturally terminates an episode, how would I modify these? Could I just set EPS_STEPS...

slerman12

Conceptual question about DQN when reward is always -1

Given that the OpenAI Gym environment [MountainCar-v0](https://github.com/openai/gym/blob/master/gym/envs/classic_control/mountain_car.py) ALWAYS returns -1.0 as a reward (even when goal is achieved), I don't understand how DQN with experience-replay converges, yet I know it...

keithmgould

A3C version missing a GAMMA

In line 202 of A3C version, R is divided by GAMMA. But in line 211, R is not. According to your blog: R_1=(R_0 - r_0+gamma^n * r_n)/gamma I think line...

haowenke

Corner case in A3C added where the game ends before N steps are taken

Although, I tried and tested it on terminal with python3.6 where it runs successfully, the indentation in "files changed" looks a bit off.

kirarpit

Using local variables in Brain class

Brain class is using global stateCnt and actionCnt instead of local ones.

edmarisov

Threading in A3C is not really concurrent

2

Used `threading` library used in A3C example is not really concurrent. See https://docs.python.org/3/library/threading.html. > In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at...

jaromiru

annealing bias

2

I could be wrong but it does not seem that you are annealing the bias with important sampling as suggested in the PER paper(section 3.4). w_i = (1/N * 1/P(i))^beta...

zacwellmer

how to add an LSTM layer?

markovyao

question

Just some questions

Hello, first of all, congrats for the article. I'm using it for study, and i'm trying to run your code to better undestanding. So, i have some questions: - Do...

mikael-thiago

AI-blog
AI-blog copied to clipboard

Metadata

Playing Breakout with the program

A couple of questions

Conceptual question about DQN when reward is always -1

A3C version missing a GAMMA

Corner case in A3C added where the game ends before N steps are taken

Using local variables in Brain class

Threading in A3C is not really concurrent

annealing bias

how to add an LSTM layer?

Just some questions

← Metadata

Owner

Metadata

AI-blog AI-blog copied to clipboard

Metadata

← Metadata

Owner

Metadata

AI-blog
AI-blog copied to clipboard