acme Example of using tfrecord logger to use acme with tensorboard

I'm attempting to use some of the baseline examples (e.g., the dqn agent trained on Atari) with tensorboard. I found the TFSummaryLogger and attempted to replace the default logger with that logger. This creates the tfrecord in the expected place, but does not populate the tfrecord with results. I also attempted to replicate what is being done in the default logger, replacing the CVSLogger with the TFSummaryLogger. This gave the same result.

Is there a working example of using TFSummaryLogger with acme, i.e. to generate artifacts that can be used with Tensorboard?

May 18 '22 00:05 rdevon

It's interesting that this didn't work out of the box. I suspect the issue is with buffered writes. Can you try closing the logger when you finish writing to see if results can be seen?

May 18 '22 21:05 ethanluoyc

Sure. I can flush the logger before terminating. Digging a little I don't think the logger is being flushed at all in the training loop. There is exactly one mention of the TFRecordLogger in the repo, and it's the definition.

May 18 '22 21:05 rdevon

@rdevon The flushing should be done automatically by TB but maybe that's not working in your case. There is a auto-close wrapper https://github.com/deepmind/acme/blob/master/acme/utils/loggers/auto_close.py that you can wrap the logger which should close the logger which means flushing the TB writer.

If this still doesn't work. One thing I just think of is to check if you are passing the steps_key argument to tensorboard. If you do but the logging data doesn't contain the key, then I believe that no logs will be written to TB.

May 18 '22 21:05 ethanluoyc

Those sound like great suggestions. I'll look into it

May 18 '22 21:05 rdevon

So I've done the following, trying to follow the recent changes to the experiment config. I've replaced the config in run_dqn.py with the following code:

logger = lambda label, steps_key, i: AutoCloseLogger(TFSummaryLogger(label, summarydir, steps_key=steps_key))

return experiments.Config(
      builder=dqn_builder,
      environment_factory=lambda seed: environment,
      network_factory=lambda spec: network,
      policy_network_factory=dqn.behavior_policy,
      evaluator_factories=[],
      seed=FLAGS.seed,
      max_number_of_steps=FLAGS.num_steps,
      logger_factory=logger)

It appears that, digging into the code a little, that the logger_factory needs to be a function that outputs a fresh logger with that signature. There aren't any examples though I can work off of to tell if this is correct. However, the logs still appear to not be writing.

May 23 '22 21:05 rdevon

adding in some prints, it does appear that what I did is sending the data correctly to the tf logger, but not sure why it's not writing.

May 23 '22 21:05 rdevon

OK, I got it working with this factory:

  def make_logger(label, steps_key, i):
      terminal_logger = TerminalLogger(label=label, print_fn=logging.info)
      tb_logger = TFSummaryLogger(summarydir, label=label, steps_key=steps_key)
      serialize_fn = base.to_numpy
      logger = aggregators.Dispatcher([terminal_logger, tb_logger], serialize_fn)
      return logger

I'm not sure if this will work with the distributed flags though

May 23 '22 23:05 rdevon