reinforcement_learning icon indicating copy to clipboard operation
reinforcement_learning copied to clipboard

Memory Leak in DDQN

Open xiboli opened this issue 1 year ago • 2 comments

Thank you so much that you program the Double DQN algorithm. However when I run this algorithm I faced a memory increase consistantly when trainning. Do you have any idea where the memory leak could happen?

https://github.com/ChuaCheowHuan/reinforcement_learning/blob/master/DQN_variants/DDQN/double_dqn_cartpole.py#L339

xiboli avatar Nov 07 '23 17:11 xiboli

I have found that the huber_loss with GradientDescentOptimizer cause the memory leak, and when I changed to reduce mean with RMSPropOptimizer it disappears. Can you explain why you use the huber loss with gradient descent optimizer? Thank you so much.

       with tf.variable_scope('loss'):
            self.loss = tf.reduce_mean(tf.squared_difference(td_target, predicted_Q_val))  # tf.losses.huber_loss(td_target, predicted_Q_val)
        with tf.variable_scope('optimizer'):
            self.optimizer = tf.train.RMSPropOptimizer(self.learning_rate).minimize(self.loss) #tf.train.GradientDescentOptimizer(self.learning_rate).minimize(self.loss)

xiboli avatar Nov 09 '23 08:11 xiboli

Thanks for your observation--I wasn't aware that the "leak" was associated with the Huber loss function and sadly don't know why this should be, but will make a note to check it out once things here subside to a dull roar, so to speak.

Until we can evaluate the impact of a change of loss function production code at the moment is avoiding batch inputs with model.fit(), instead fitting in a loop and saving/clearing/reloading the model periodically, which stopgap manages to prevent memory (64 GBytes) being completely consumed before convergence obtains.

If it's of any interest the restart algo is triggered by the following command placed at a convenient spot in the model.fit() loop:

agent.save_restart(repeat,idx)

where "agent" is a class instance containing the model and its methods, as follows:

  def save_restart(self, repeat, idx):
    self.last_model = f'{self.model_path}/{self.model_name}_{repeat}_{idx}'
    self.model.save(self.last_model)
    tf.keras.backend.clear_session()
    self.load_last_model()

  def load_last_model(self):
    model = load_model(self.last_model, custom_objects = self.custom_objects, compile=False)
    model.compile(optimizer = self.optimizer, loss = self.loss())

  def save(self, repeat, idx):
    self.last_model = f'{self.model_path}/{self.model_name}_{repeat}_{idx}'
    self.model.save(self.last_model)


bcnichols avatar Feb 05 '24 12:02 bcnichols