Deep-QLearning-Agent-for-Traffic-Signal-Control
Deep-QLearning-Agent-for-Traffic-Signal-Control copied to clipboard
Reward
Reward is sum of cumulative wait time, right? How it is going negative in the graph(after running testing_main.py)?
I think the reward is defined as the difference of cumulative wait time between the action intervals. So positive or negative rewards will be recevied.