AI-blog
AI-blog copied to clipboard
Accompanying repository for Let's make a DQN / A3C series.
I am trying to adapt your code to train the agent to play breakout. I tried to use both the CartPole-basic file as well as the Seaquest-DDQN-PER file but the...
First, what are EPS_START, EPS_STOP, and EPS_STEPS? If I want episodes to last until the game naturally terminates an episode, how would I modify these? Could I just set EPS_STEPS...
Given that the OpenAI Gym environment [MountainCar-v0](https://github.com/openai/gym/blob/master/gym/envs/classic_control/mountain_car.py) ALWAYS returns -1.0 as a reward (even when goal is achieved), I don't understand how DQN with experience-replay converges, yet I know it...
In line 202 of A3C version, R is divided by GAMMA. But in line 211, R is not. According to your blog: R_1=(R_0 - r_0+gamma^n * r_n)/gamma I think line...
Although, I tried and tested it on terminal with python3.6 where it runs successfully, the indentation in "files changed" looks a bit off.
Brain class is using global stateCnt and actionCnt instead of local ones.
Used `threading` library used in A3C example is not really concurrent. See https://docs.python.org/3/library/threading.html. > In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at...
I could be wrong but it does not seem that you are annealing the bias with important sampling as suggested in the PER paper(section 3.4). w_i = (1/N * 1/P(i))^beta...
Hello, first of all, congrats for the article. I'm using it for study, and i'm trying to run your code to better undestanding. So, i have some questions: - Do...