stable-baselines icon indicating copy to clipboard operation
stable-baselines copied to clipboard

Should `TensorboardWriter` close its `tf.summary.FileWriter`?

Open shwang opened this issue 4 years ago • 4 comments

PPO2 uses a with TensorboardWriter(...) as writer: context that flushes but doesn't ever close its tf.summary.FileWriter. This led to (in combination with another problem on my side) a "too many files are opened by this process" error in one of my runs when I called PPO2.learn() repeatedly.

Maybe the intention here is to allow us to access the same FileWriter later, but a second call to PPO2.learn() in facts opens a new events file and creates a new FileWriter, which again is not closed by the time that learn exits.

Relevant lines in TensorboardWriter:

https://github.com/hill-a/stable-baselines/blob/6347da3abcb3196f468ab9f46e97c9c2afb8111d/stable_baselines/common/base_class.py#L1137-L1145

https://github.com/hill-a/stable-baselines/blob/6347da3abcb3196f468ab9f46e97c9c2afb8111d/stable_baselines/common/base_class.py#L1161-L1164

shwang avatar May 14 '20 04:05 shwang

Maybe the context flushes instead of closing because we should be reusing the old Tensorboard FileWriter when possible.

That way we don't create a new FileWriter, therefore a new events file every time we call PPO2.learn(reset_num_timesteps=False).

I'm ending up with long and growing list of files like:

├── sb_tb
│   └── PPO2_1
│       ├── events.out.tfevents.1589433242.spinach
│       ├── events.out.tfevents.1589433245.spinach
│       ├── events.out.tfevents.1589433248.spinach
│       ├── events.out.tfevents.1589433250.spinach
│       ├── events.out.tfevents.1589433253.spinach
│       ├── events.out.tfevents.1589433255.spinach
│       ├── events.out.tfevents.1589433257.spinach
│       ├── events.out.tfevents.1589433260.spinach
│       ├── events.out.tfevents.1589433262.spinach
│       └── events.out.tfevents.1589433265.spinach

Granted, I can just rely on the ep reward mean logs from Monitor and logger.logkv() which don't use this TensorboardWriter context, so it's not at all critical for me to activate it.

shwang avatar May 14 '20 05:05 shwang

Hello,

Maybe a duplicate of https://github.com/hill-a/stable-baselines/issues/501 But really sounds like a bug

araffin avatar May 14 '20 07:05 araffin

new_tb_log==False here does not work?

Jiankai-Sun avatar May 20 '20 02:05 Jiankai-Sun

new_tb_log==False here does not work?

There is an issue about that: https://github.com/hill-a/stable-baselines/issues/599#issuecomment-561709799

araffin avatar May 20 '20 07:05 araffin