stable-baselines memory leak when calling self.sess.run in _train

I am seeing memory usage increases every time self.sess.run (in PPO2) is called.
Profile shown as follows:

Line #    Mem usage    Increment   Line Contents
================================================
   375                                             writer.add_run_metadata(run_metadata, 'step%d' % (update * update_fac))
   376                                         else:
   377   4361.3 MiB      0.0 MiB                   _, summary, policy_loss, value_loss, policy_entropy, approxkl, clipfrac = self.sess.run(
   378   4361.3 MiB      0.0 MiB                       [self._train, self.summary, self.pg_loss, self.vf_loss, self.entropy, self.approxkl, self.clipfrac],
   379   4362.8 MiB      1.5 MiB                       td_map)
   380   4362.8 MiB      0.0 MiB               writer.add_summary(summary, (update * update_fac))
   381                                     else:
   382                                         _, policy_loss, value_loss, policy_entropy, approxkl, clipfrac = self.sess.run(
   383                                             [self._train, self.pg_loss, self.vf_loss, self.entropy, self.approxkl, self.clipfrac], td_map)
   384   4362.8 MiB      0.0 MiB           return policy_loss, value_loss, policy_entropy, approxkl, clipfrac

as well as

   363   4374.4 MiB      0.0 MiB               if self.full_tensorboard_log and (1 + update) % 10 == 0:
   364   4374.4 MiB      0.0 MiB                   run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
   365   4374.4 MiB      0.0 MiB                   run_metadata = tf.RunMetadata()
   366   4374.4 MiB      0.0 MiB                   _, summary, policy_loss, value_loss, policy_entropy, approxkl, clipfrac = self.sess.run(
   367   4374.4 MiB      0.0 MiB                       [self._train, self.summary, self.pg_loss, self.vf_loss, self.entropy, self.approxkl, self.clipfrac],
   368   4405.6 MiB     31.1 MiB                       td_map, options=run_options, run_metadata=run_metadata)
   369                                             
   370                                             # tl = timeline.Timeline(run_metadata.step_stats)
   371                                             # ctf = tl.generate_chrome_trace_format()
   372                                             # with open('timeline.json', 'w') as f:
   373                                             #     f.write(ctf)
   374                             
   375   4490.1 MiB     84.6 MiB                   writer.add_run_metadata(run_metadata, 'step%d' % (update * update_fac))

I added self.sess.graph.finalize() when initializing the graph, so I think no additional op is added. Just curious what could be a potential cause of this issue.

I am using the following versions:

python version 3.6.9 tensorflow version 1.15 numpy version 1.18.1 OS: Red Hat Enterprise Linux Server 7.5 (Maipo)

The only additional info I can think of is that I am working on an env designed by myself. I don't think the env is causing the issue because if the training step, i.e. self.sess.run, is commented out, memory usage does not increase at all.

The model I am training is defined by

policy = 'LstmPolicy'
nminibatches = 4
n_steps = 256
policy_kwargs  = {"net_arch":[128, 'lstm', dict(vf=[128], pi=[128, 128])]}
tb_log = "tensorboard"

model = PPO2(policy=LstmPolicy, env=env, n_steps=n_steps, nminibatches=nminibatches,
             lam=0.95, gamma=1., noptepochs=4, ent_coef=0.00001,
             learning_rate=lambda f:f, cliprange=0.2, verbose=1, tensorboard_log=tb_log,
             policy_kwargs=policy_kwargs, full_tensorboard_log=True)

Let me know if any additional info is required. Thanks!

Feb 24 '20 19:02 denyHell

Hello, please fill up the issue template completely.

Feb 24 '20 19:02 araffin

@araffin Hi, I added some more info. Let me know what additional infos are needed. Thanks in advance!

Feb 24 '20 19:02 denyHell

How did you do the profiling? is the issue replicable using a standard environments?

Jul 18 '20 14:07 AlessandroZavoli

I have similar issue: terrible memory leak, stopping Python completely after few minutes of pretraining with PPO2 (normal training has no problems). I have no idea in which location in code, standard tools like objgraph or tracemalloc show nothing. A native memory issue?

@denyHell with what tool did you get your results?

Oct 06 '20 06:10 iirekm

memory leak when calling self.sess.run in _train_step of PPO2[question]