coach icon indicating copy to clipboard operation
coach copied to clipboard

Out of bounds error in Categorical DQN

Open redknightlois opened this issue 5 years ago • 0 comments

I have been playing with the agent and noticed that my Q values are clustered along the [10..20] then I setup vmin=10 and vmax=20 but if vmin is bigger than 0 it will fail with an index error.

Traceback (most recent call last):
  File "E:\Src\trendstone-git\src\tf-next3\agents\v1\train_rl.py", line 203, in <module>
    graph_manager.improve()
  File "C:\Anaconda3\lib\site-packages\rl_coach\graph_managers\graph_manager.py", line 547, in improve
    self.train_and_act(self.steps_between_evaluation_periods)
  File "C:\Anaconda3\lib\site-packages\rl_coach\graph_managers\graph_manager.py", line 482, in train_and_act
    self.train()
  File "C:\Anaconda3\lib\site-packages\rl_coach\graph_managers\graph_manager.py", line 408, in train
    [manager.train() for manager in self.level_managers]
  File "C:\Anaconda3\lib\site-packages\rl_coach\graph_managers\graph_manager.py", line 408, in <listcomp>
    [manager.train() for manager in self.level_managers]
  File "C:\Anaconda3\lib\site-packages\rl_coach\level_manager.py", line 187, in train
    [agent.train() for agent in self.agents.values()]
  File "C:\Anaconda3\lib\site-packages\rl_coach\level_manager.py", line 187, in <listcomp>
    [agent.train() for agent in self.agents.values()]
  File "C:\Anaconda3\lib\site-packages\rl_coach\agents\agent.py", line 737, in train
    total_loss, losses, unclipped_grads = self.learn_from_batch(batch)
  File "C:\Anaconda3\lib\site-packages\rl_coach\agents\categorical_dqn_agent.py", line 146, in learn_from_batch
    m[batches, u] += (distributional_q_st_plus_1[batches, target_actions, j] * (bj - l))
IndexError: index 51 is out of bounds for axis 1 with size 51

redknightlois avatar Jun 28 '19 18:06 redknightlois