Example of using tfrecord logger to use acme with tensorboard
I'm attempting to use some of the baseline examples (e.g., the dqn agent trained on Atari) with tensorboard. I found the TFSummaryLogger and attempted to replace the default logger with that logger. This creates the tfrecord in the expected place, but does not populate the tfrecord with results. I also attempted to replicate what is being done in the default logger, replacing the CVSLogger with the TFSummaryLogger. This gave the same result.
Is there a working example of using TFSummaryLogger with acme, i.e. to generate artifacts that can be used with Tensorboard?
It's interesting that this didn't work out of the box. I suspect the issue is with buffered writes. Can you try closing the logger when you finish writing to see if results can be seen?
Sure. I can flush the logger before terminating. Digging a little I don't think the logger is being flushed at all in the training loop. There is exactly one mention of the TFRecordLogger in the repo, and it's the definition.
@rdevon The flushing should be done automatically by TB but maybe that's not working in your case. There is a auto-close wrapper https://github.com/deepmind/acme/blob/master/acme/utils/loggers/auto_close.py that you can wrap the logger which should close the logger which means flushing the TB writer.
If this still doesn't work. One thing I just think of is to check if you are passing the steps_key argument to tensorboard. If you do but the logging data doesn't contain the key, then I believe that no logs will be written to TB.
Those sound like great suggestions. I'll look into it
So I've done the following, trying to follow the recent changes to the experiment config. I've replaced the config in run_dqn.py with the following code:
logger = lambda label, steps_key, i: AutoCloseLogger(TFSummaryLogger(label, summarydir, steps_key=steps_key))
return experiments.Config(
builder=dqn_builder,
environment_factory=lambda seed: environment,
network_factory=lambda spec: network,
policy_network_factory=dqn.behavior_policy,
evaluator_factories=[],
seed=FLAGS.seed,
max_number_of_steps=FLAGS.num_steps,
logger_factory=logger)
It appears that, digging into the code a little, that the logger_factory needs to be a function that outputs a fresh logger with that signature. There aren't any examples though I can work off of to tell if this is correct. However, the logs still appear to not be writing.
adding in some prints, it does appear that what I did is sending the data correctly to the tf logger, but not sure why it's not writing.
OK, I got it working with this factory:
def make_logger(label, steps_key, i):
terminal_logger = TerminalLogger(label=label, print_fn=logging.info)
tb_logger = TFSummaryLogger(summarydir, label=label, steps_key=steps_key)
serialize_fn = base.to_numpy
logger = aggregators.Dispatcher([terminal_logger, tb_logger], serialize_fn)
return logger
I'm not sure if this will work with the distributed flags though