reinforcejs icon indicating copy to clipboard operation
reinforcejs copied to clipboard

Reinforcement Learning Agents in Javascript (Dynamic Programming, Temporal Difference, Deep Q-Learning, Stochastic/Deterministic Policy Gradients)

Results 25 reinforcejs issues
Sort by recently updated
recently updated
newest added

I am on chrome latest version on a mac os on the website https://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_td.html and I don't see the tex equations being rendered properly Looks from here: https://tex.stackexchange.com/questions/299523/how-can-i-compile-tex-code-appearing-on-websites, that MathJAX...

https://github.com/karpathy/reinforcejs/blob/08d2030d13b6a64ee4dd4ed75d1cd273a46aa8de/lib/rl.js#L460 On line 460 there is a call to make a new random matrix, and the second argument is `hidden_size` which is only actually defined inside the for loop above...

How hard would it be to implement this? I'm trying ReinforceJS the 2048 game here: https://github.com/NullVoxPopuli/doctor-who-thirteen-game-ai/blob/master/worker.js#L105 and I've noticed a couple things: - the ai gets to it's best score...

Using the initial settings, how can the discounted reward of the center field be 1.1? The max reward the agent can get is 1.0 and then the goal is reached...

After changing a cell's reward, one can never change it back to 0.00. The least possible amount to be chosen is always -0.1 or 0.1. ![image](https://user-images.githubusercontent.com/9498649/117164241-bb26a780-adc4-11eb-9187-6a7ee2fa98f4.png)

In case it's interesting or you have a sample gallery, breakout with deep-q learning (since that's the youtube video example originally shown for such games) http://4quant.com/javascript-breakout/ repo: https://github.com/4Quant/javascript-breakout

1. Should all state inputs to act be 0

…use existing functions with different arguments the main part i am sharing is the method of queing backprop functions. i suspect that this might be less expensive than creating new...

Hi, I would like to contribute a Inverse Pendulum Domain.... But for my shame, not well enough evaluated.