deep-rl-class
deep-rl-class copied to clipboard
Suggested update example Unit 1: discounting
I have a suggestion to update an example in the blogpost for Unit 1. I'm probably being somewhat nitpicky and probably in general the example already work well to get the general point across to most beginners, but technically I think itcould be more precise (without making it harder to understand).
The issue is with the example of the cat near the bigger piles of cheese, and the explanation of why discounting is used in RL. See picture below:
The explanation sort of implies that in this case the discounting would be related to the spatial proximity of the potential hazard (cat in this case) to the bigger piles of cheese. But discounting is of course solely about temporal aspects. In fact, in this example, if the initial position of the cat had been in the bottom of the screen, we would still be discounting the larger piles of cheese at the top due to their (temporal) distance. And if, for whatever reason, our mouse happens to have already moved closer towards the top of the screen, then from that point onwards it will be the single-cheese cells at the bottom that get discounted.
As a potential solution, I think I would suggest to remove the cat altogether, and provide examples of "invisible" hazards. For example, maybe we prefer to eat cheese fast because its taste gets worse over time. Or maybe we simply have a random stopping time for the episode, and so if we run for the bigger piles of cheese there is a (randomised) risk that we might not arrive in time.
Hey there 👋 , Thanks for the feedback and pointing it out. I'm agree it might be misleading. I'm adding it to the updates I'll need to make for the first units.
Have a nice day 🤗