baselines
baselines copied to clipboard
About Tensorboard
Hello, I am not sure about how to set a Tensorboard? I have set the environment variable, I use the PyCharm, so I donot know how to modify the --log-dir ?
It shows "Logging to /tmp/openai-2018*". However, I canno find the directory "/tmp".
I have found the directory “/tmp". However, I only find 0.0.monitor.csv log.txt progress.csv Where should I find the file about Tensorboard?
Hi @Nara0731 ! We have recently added this section to the README: https://github.com/openai/baselines/blob/master/README.md#using-baselines-with-tensorboard
basically, you need to set env variables:
OPENAI_LOGDIR to where you want the tensorboard files to be saved, and
OPENAI_LOG_FORMAT to 'stdout,tensorboard' (if you only need output to command line and tensorboard).
The tensorboard data should show up in OPENAI_LOGDIR (subfolder tb).
You can launch tensorboard via tensorboard --logdir=$OPENAI_LOGDIR
From the fact logs are saved to /tmp/openai-2018* location, I suspect that neither of the environment variables are actually set (at least from python interpreter perspective). Could you run
import os; print(os.environ)
in python and paste here the output? If OPENAI_LOGDIR and OPENAI_LOG_FORMAT are not there, you can set them directly from python:
os.environ['OPENAI_LOGDIR'] = ...
os.environ['OPENAI_LOG_FORMAT'] = 'stdout,tensorboard'
(that has to happen before you start training) Hope this helps!
Yeah, I have set export OPENAI_LOG_FORMAT='stdout,log,csv,tensorboard' # formats are comma-separated, but for tensorboard you only really need the last one export OPENAI_LOGDIR=/tmp
Unfortunately, I did not find any relevant file about tensorboard in the "/tmp"
hm... let's solve it one step at a time. Could you run import os; print(os.environ)
in python?
Yes it show /usr/bin/python3.6 /home/ubuntu/baselines/baselines/run.py Logging to /tmp/openai-2018-09-21-14-19-48-807530 environ({'PATH': '/home/ubuntu/bin:/home/ubuntu/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin', 'LC_MEASUREMENT': 'zh_CN.UTF-8', 'XAUTHORITY': '/home/ubuntu/.Xauthority', 'XMODIFIERS': '@im=ibus', 'LC_TELEPHONE': 'zh_CN.UTF-8', 'XDG_DATA_DIRS': '/usr/share/ubuntu:/usr/share/gnome:/usr/local/share:/usr/share:/var/lib/snapd/desktop:/var/lib/snapd/desktop', 'GDMSESSION': 'ubuntu', 'MANDATORY_PATH': '/usr/share/gconf/ubuntu.mandatory.path', 'LC_TIME': 'zh_CN.UTF-8', 'GTK_IM_MODULE': 'ibus', 'DBUS_SESSION_BUS_ADDRESS': 'unix:abstract=/tmp/dbus-mKrFfL3RGO', 'DEFAULTS_PATH': '/usr/share/gconf/ubuntu.default.path', 'XDG_CURRENT_DESKTOP': 'Unity', 'LD_LIBRARY_PATH': '/home/ubuntu/.mujoco/mjpro150/bin:/usr/lib/nvidia-384', 'UPSTART_SESSION': 'unix:abstract=/com/ubuntu/upstart-session/1000/1458', 'QT4_IM_MODULE': 'xim', 'LC_PAPER': 'zh_CN.UTF-8', 'SESSION_MANAGER': 'local/ubuntu-pc:@/tmp/.ICE-unix/1708,unix/ubuntu-pc:/tmp/.ICE-unix/1708', 'QT_LINUX_ACCESSIBILITY_ALWAYS_ON': '1', 'LOGNAME': 'ubuntu', 'JOB': 'unity-settings-daemon', 'PWD': '/home/ubuntu/baselines/baselines', 'IM_CONFIG_PHASE': '1', 'PYCHARM_HOSTED': '1', 'LANGUAGE': 'en_US', 'PYTHONPATH': '/home/ubuntu/baselines', 'SHELL': '/bin/bash', 'LC_ADDRESS': 'zh_CN.UTF-8', 'UNITY_HAS_3D_SUPPORT': 'true', 'GIO_LAUNCHED_DESKTOP_FILE': '/usr/share/applications/jetbrains-pycharm-ce.desktop', 'GTK2_MODULES': 'overlay-scrollbar', 'INSTANCE': '', 'OLDPWD': '/home/ubuntu/package/pycharm-community-2018.1.2/bin', 'GNOME_DESKTOP_SESSION_ID': 'this-is-deprecated', 'UPSTART_INSTANCE': '', 'CLUTTER_IM_MODULE': 'xim', 'XDG_SESSION_PATH': '/org/freedesktop/DisplayManager/Session0', 'COMPIZ_BIN_PATH': '/usr/bin/', 'SESSIONTYPE': 'gnome-session', 'XDG_SESSION_DESKTOP': 'ubuntu', 'SHLVL': '0', 'LC_IDENTIFICATION': 'zh_CN.UTF-8', 'LC_MONETARY': 'zh_CN.UTF-8', 'COMPIZ_CONFIG_PROFILE': 'ubuntu', 'QT_IM_MODULE': 'ibus', 'UPSTART_JOB': 'unity7', 'XDG_CONFIG_DIRS': '/etc/xdg/xdg-ubuntu:/usr/share/upstart/xdg:/etc/xdg', 'LANG': 'en_US.UTF-8', 'GNOME_KEYRING_CONTROL': '', 'XDG_SEAT_PATH': '/org/freedesktop/DisplayManager/Seat0', 'XDG_SESSION_ID': 'c2', 'XDG_SESSION_TYPE': 'x11', 'DISPLAY': ':0', 'UNITY_DEFAULT_PROFILE': 'unity', 'LC_NAME': 'zh_CN.UTF-8', 'GDM_LANG': 'en_US', 'PYTHONIOENCODING': 'UTF-8', 'XDG_GREETER_DATA_DIR': '/var/lib/lightdm-data/ubuntu', 'UPSTART_EVENTS': 'xsession started', 'GPG_AGENT_INFO': '/home/ubuntu/.gnupg/S.gpg-agent:0:1', 'DESKTOP_SESSION': 'ubuntu', 'SESSION': 'ubuntu', 'USER': 'ubuntu', 'XDG_MENU_PREFIX': 'gnome-', 'GIO_LAUNCHED_DESKTOP_FILE_PID': '1996', 'QT_ACCESSIBILITY': '1', 'LC_NUMERIC': 'zh_CN.UTF-8', 'SSH_AUTH_SOCK': '/run/user/1000/keyring/ssh', 'XDG_SEAT': 'seat0', 'PYTHONUNBUFFERED': '1', 'QT_QPA_PLATFORMTHEME': 'appmenu-qt5', 'LD_PRELOAD': '/usr/lib/x86_64-linux-gnu/libGLEW.so:/usr/lib/nvidia-384/libGL.so', 'XDG_VTNR': '7', 'XDG_RUNTIME_DIR': '/run/user/1000', 'HOME': '/home/ubuntu', 'GNOME_KEYRING_PID': ''})
Thanks! Yeah, so basically one way or another the OPENAI_LOGDIR and OPENAI_LOG_FORMAT do not make it to the python process environment variables. The fix is really easy - add
import os
os.environ['OPENAI_LOGDIR']='/tmp'
os.environ['OPENAI_LOG_FORMAT']='stdout,tensorboard'
to the very top of your python script; and try running it again. Ideally, tensorboard checkpoints should show up in /tmp/tb
folder. Please let me know if that does not work for you,
I cannot find the "/tmp/tb", I only find "tmp"
okay; could you post here your python code please? Thanks!
Hi, I have the same problem as you.
I solved the problem like this.
Just modify the code for 209th in run.py.
if MPI is None or MPI.COMM_WORLD.Get_rank() == 0: rank = 0 logger.configure(dir='./log',format_strs=['stdout','log','csv','tensorboard'])
Really? I will try it.
Hi @pzhokhov @smalltingting I configured the logger setting as you have mentioned. I see it created a directory called "tb". However, it is empty. Any idea what is going on?
I am using deepq example but I think it shouldn't matter. This is how I configure it in my code:
def main():
logger.configure(dir='.log', format_strs=['stdout', 'log', 'csv', 'tensorboard'])
@srivatsankrishnan does logger print anything on the screen / in the log file? Logger only saves data when a logger.dumpkvs() (or logger.dump_tabular()) is called, which by default happens fairly rarely in deepq. Could you try with --print_freq=1
option?
Hi @pzhokhov, The only thing the logger prints in the screen is this message: "Logging to .log"
It creates the following folder structure in .log: /logs |------tb |------log |------progress
The tb folder is empty. The progress.csv is also empty. The "log" ( the file that gets created inside the directory) file basically has the same message that was printed in the console ("Logging to .log"). I tried changing the --print_freq=1 but the results are the same.
I tried to hack the code where I create my model (models.py) to explicitly export my graph to visualize in TensorBoard. This is what i use:
tf_writer = tf.summary.FileWriter(LOGDIR)
tf_writer.add_graph(tf.get_default_session().graph)
But the graph is too complex and can't trace to my input and output nodes ( Honestly trying to make sense of it and not given up on that yet). I assume the functionality that you guys enable with logger for tensorboard will be more structured or methodical to visualize it in tensorboard.
Hi @srivatsankrishnan ! Sorry about the lag. If all the progress.csv is empty, tb/ subfolder is empty and nothing interesting is printed on the screen, it means that
- the training did not progress to the point where it would save anything (call logger.dump_tabular()) . or
- something bad happened to the logger module
Could you try running a simple test with deepq, for instance:
export OPENAI_LOG_FORMAT=stdout,csv,tensorboard
export OPENAI_LOGDIR=.log
python -m baselines.run --alg=deepq --env=CartPole-v0 --print_freq=1 --num_timesteps=1e5
If everything works correctly, this should generate a long output that looks like:
-----------------------------------
| % time spent exploring | 2 |
| episodes | 843 |
| mean 100 episode reward | 190.8 |
| steps | 99081 |
-----------------------------------
and files progress.csv
, 0.0.monitor.csv
, log.txt
, and subfolder tb
in .log.
If that works, but your case still does not, it probably means that logger / logger configuration are messed up. If the test above does not work, then something in your python environment is not quite right; and in that case, I'd recommend installing baselines in a clean virtualenv, and trying again.
Hi @pzhokhov! No worries. This one works and I see logs and event file getting generated. When I open tensorboard, it only has the scalars such as (% time spent exploring, episodes, rewards etc). I don't see a graph in tensorboard.I was interested in seeing the graph for the neural net model to determine input and output nodes. I just hacked the code where I define the model to capture the graph. So in a way, I was able to get what I wanted.
As you put it, In my case, I just put 100 steps for my environment and --print_freq=1 to quickly capture the graph. Maybe it didn't get to a point where logger.dump_tabular() wasn't getting called.
On a different note, Is there a plan to support saving the model in native tensorflow format along with graph (.pb)? The reason is that there are lots of interesting tools in tensorflow and they basically require the model in one of these formats.
oh now I see :) Yeah, the logger only saves scalars. As for long-term support of saving entire models in tensorflow / tensorboard support and serialization in general - this has been a subject of quite a bit of debate. We will likely support custom serialization functions (so that every use case can pick its poison), but I don't have a timeline for that. If you could provide an example of useful functionality that is missing by not saving data in tensorflow format, we can speed it up somewhat :)
Hi @pzhokhov, Thanks for your reply. There are lots of tools in tensorflow to fine-tune inference performance: (https://github.com/tensorflow/tensorflow/tree/master/tensorflow/python/tools) (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/README.md) The basic requirement is to use these tools is to have models saved in native tensorflow format (checkpoints, .pb etc). I am particularly interested in using these tools and was able to hack the code to save it in the native tensorflow format. I am currently facing some tensorflow related issue but will soon be able to test it out once I resolve those.
I have one more useful functionality in mind but its orthogonal to this discussion. Maybe I will open a new issue for it to avoid mixing it up with this.
Was there any resolution to this issue? I've tried the same suggestions that have been listed so far (os.environ['OPENAI_LOGDIR'] = ... and os.environ['OPENAI_LOG_FORMAT] = 'stdout,tensorboard') and I can get those to be listed on print(os.environ), but I am not getting any file outputs. Any ideas?
I just wanted to know how to move the logs from /tmp
directory to directory of choice, as I have to manually save the /tmp/openai-2022.....
files to get the checkpoints for training.
PS I am using multiple gpus for training, hope that suggested methods works for multi gpu training
你好,我已收到你的来信。若有重要事情,请短信告知!