[Bug] Tensorboard is only available during the training run and not after it has finished
Tensorboard, started from the GUI, seems only to work while a training is running:
- If I press the "Tensorboard" button before I press the "Start Training" button, the Tensorboard Server seems not to start.
- Pressing "Tensorboard" after "Start Training" works, but:
- Immediately after a training is finished the connection to the tensorboard get lost. Even if you press the "Tensorboard" button again.
It would be great if you could use the Tensorboard easily via the GUI as long as the GUI window is opened.
Can confirm I can reproduce this with 100% rate. I thought this was intended behaviour lol
Thank you for confirming.
Can confirm I can reproduce this with 100% rate. I thought this was intended behaviour lol
It is actually completely intended behaviour. I have however created an initial PR to address this, instead tying the server stop to the launch of a new training run. There are some issues with it however, it was tied completely to the UI. I've got to re-work my approach, however at the moment I've got a few projects I'm working on.
Yeah this it is quite annoying. If only we could view these graphs inside the actual program window, instead of having to use a whole web browser just to view some graphs...
@Zueuk thats not what this bug is about at all. What you are referring to is an entirely different thing.
Hey guys this is a quick hacky temp solution I'm using right now. But you can disable the shutdown of tensor board by commenting out these lines around 778 in the GenericTrainer.py.
#self.tensorboard.close()
#if self.config.tensorboard:
#super()._stop_tensorboard()
This is a very hacky solution though and if you want to run things again you should kill onetrainer and the python process that runs tensorboard on subsequent runs or you'll get a port conflict.
Anyway, this might be helpful at least until SirTrippsalot considers how he wants to approach the solution. Anyway hope it helps and thanks SirTrippsalot and others for all your hard work on this.
IMO simplest clean solution is just to provide a wrapper that uses OneTrainer's venv and path settings, and starts up a standalone tensorboard.
I'm a linux guy, so here's the linux version.
#!/usr/bin/env bash
. venv/bin/activate
tdir=$(python <<EOF
import json
with open("training_presets/#.json") as f:
jdata = json.load(f)
print(jdata["workspace_dir"]+"/tensorboard")
EOF
)
# host 0.0.0.0 to allow connection from other machines
tensorboard --logdir $tdir --host 0.0.0.0
@ppbrown thanks man I love it! Only caveat might be if you're really strung out for ram as it looks like its adding another 276MB which might not be a big deal. But it works and without all the hacky stuff. I couldn't get a windows batch version working so I just ran it all in python. This works in Win if you place it in a python file in your OneTrainer root folder and run it. Thanks again man.
import json import subprocess jdata = "" with open("training_presets\#.json") as f: jdata = json.load(f) logfile = jdata['workspace_dir']+'\tensorboard' subprocess.run(f".\venv\Scripts\tensorboard.exe --logdir {logfile} --host 0.0.0.0")
Interesting. That would only work if you have added the "tensorboard" module globally to your python install though.
it is not there by default. For most people you have to activate the venv first.
wait.... it DOES somehow work for me like that. I dont understand why :-/
But I had to use
os.system()
to call ./venv/bin/tensorboard
It wouldnt work for me with subprocess.run()
Yep you can run it from python in venv. I named mine tensorrrun.py put in the OneTrainer root and just run:
.\venv\scripts\python.exe tensorrun.py
well thats no fun. it needs a gui compatible solution. with the linux "#!/bin/env python" magic, it works. Not sure what windows equivalent is. I would think just naming it ".py" should be adequate
Yea a GUI solution would be best. I think the challenge is more of a design issue than anything with the code. SirTrippsalot mentioned it was working that way right now by design. I think your solution is a good temp work around until they get the time to figure how they want to approach it. I think maybe you could make a change to the code to have it run in the tensorboard button but to be honest, I haven't really looked into all the code. I just looked for a quick hack as it was annoying when I'd wake up, the run would be done and I didn't see the performance. Didn't even consider just running tensorboard outside the app for some reason.
For windows you just need to make a python file in the OneTrainer root folder with the following:
import json import subprocess jdata = "" with open("training_presets#.json") as f: jdata = json.load(f) logfile = jdata['workspace_dir']+'\tensorboard' subprocess.run(f".\venv\Scripts\tensorboard.exe --logdir {logfile} --host 0.0.0.0")
Then run:
.\venv\scripts\python.exe tensorrun.py
In the command line in the OneTrainer folder. Thanks for the tip on this one ppbrown - i lurk the discord if you ever want to hmu.
any UI solution would have to reliably stop tensorboard on exit, but also still automatically kill tensorboard directly after a CLI or cloud training, because otherwise the process never ends. Plus, it has to handle changes of workspace dir or tensorboard port while tensorboard is still running. If you don't, tensorboard keeps reading from the wrong directory when you've started the next training.
My language was imprecise. When I said "GUI" i meant "not command line". ie: "one click in FileManager"
I didnt mean "integrated into OneTrainer main program."