deep-reinforcement-learning
deep-reinforcement-learning copied to clipboard
Repo for the Deep Reinforcement Learning Nanodegree program
I ran reinforce on the server of my lab, which have RTX 3090. Dramatically, the gpu usage is about 30 percent when running REINCORCE on one card. At the same...
I have been trying to set this up on macOS 12.2.1 Monterey, but all of the software (unityagents, torch, etc) are so old it doesn't install. I used some more...
There should be additional comments for successful installation of `gym[box2d]`. - Related post on Knowledge: https://knowledge.udacity.com/questions/728713
OUNoise should use normal distribution. The current implementation uses `random.random()` which I believe is uniform distribution between [0,1). This can negatively affect exploration abilities of DDPG agent, since noise will...
In this project we clearly see there is no learning happening : https://github.com/udacity/deep-reinforcement-learning/blob/master/ddpg-bipedal/DDPG.ipynb This example should converge and solve the problem.
Make the default for fullscreen False as it can be irritating for people to understand how to get out of the fullscreen mode. It is also easier to work with...
Hello, In deep-reinforcement-learning/reinforce/REINFORCE.ipynb R is implemented as a single value in the following code: ``` discounts = [gamma**i for i in range(len(rewards)+1)] R = sum([a*b for a,b in zip(discounts, rewards)])...
 Hello In the file: deep-reinforcement-learning/ddpg-pendulum/DDPG.ipynb In the [Pendulum-v0](https://github.com/openai/gym/wiki/Pendulum-v0) environment, the actions are in the range from -2.0 to +2.0 And hence, actions must be scaled before passing to the...
- correct discretization of [ 0.2 , -1.9] => [5, 3] - correct axis labels of velocity and position - fix positioning of action values Try to create a rectangular...
In [Discretization Solution notebook](https://github.com/udacity/deep-reinforcement-learning/blob/master/discretization/Discretization_Solution.ipynb), space `[0.2 , -1.9]` should be mapped into grid `[6, 3]` as described before `In [8]`. But the solution of `In [8]` is `[5, 3]` instead....