coach
coach copied to clipboard
Out of bounds error in Categorical DQN
I have been playing with the agent and noticed that my Q values are clustered along the [10..20] then I setup vmin=10 and vmax=20 but if vmin is bigger than 0 it will fail with an index error.
Traceback (most recent call last):
File "E:\Src\trendstone-git\src\tf-next3\agents\v1\train_rl.py", line 203, in <module>
graph_manager.improve()
File "C:\Anaconda3\lib\site-packages\rl_coach\graph_managers\graph_manager.py", line 547, in improve
self.train_and_act(self.steps_between_evaluation_periods)
File "C:\Anaconda3\lib\site-packages\rl_coach\graph_managers\graph_manager.py", line 482, in train_and_act
self.train()
File "C:\Anaconda3\lib\site-packages\rl_coach\graph_managers\graph_manager.py", line 408, in train
[manager.train() for manager in self.level_managers]
File "C:\Anaconda3\lib\site-packages\rl_coach\graph_managers\graph_manager.py", line 408, in <listcomp>
[manager.train() for manager in self.level_managers]
File "C:\Anaconda3\lib\site-packages\rl_coach\level_manager.py", line 187, in train
[agent.train() for agent in self.agents.values()]
File "C:\Anaconda3\lib\site-packages\rl_coach\level_manager.py", line 187, in <listcomp>
[agent.train() for agent in self.agents.values()]
File "C:\Anaconda3\lib\site-packages\rl_coach\agents\agent.py", line 737, in train
total_loss, losses, unclipped_grads = self.learn_from_batch(batch)
File "C:\Anaconda3\lib\site-packages\rl_coach\agents\categorical_dqn_agent.py", line 146, in learn_from_batch
m[batches, u] += (distributional_q_st_plus_1[batches, target_actions, j] * (bj - l))
IndexError: index 51 is out of bounds for axis 1 with size 51