hyperdash-sdk-py
hyperdash-sdk-py copied to clipboard
Add tensorboard docs
Added basic docs. LMK what you think @richardartoul
When testing with https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/4_Utils/tensorboard_basic.py, I ran into a few issues.
-
SDK watcher never detects that tensorflow experiment ended - it goes on forever.
-
Ending the sdk watcher causes the run to complete. (I guess this is hard to avoid)
-
However when you start the sdk watcher again, it creates a new forever run (even though a new experiment hasn't been created)
Let's figure out the issues before merging this.
Re 1): I think I looked into this for awhile and there doesn't seem to be a reliable way of detecting this.
Re 2) Yeah basically it has to end or it looks disconnected.
Re 3) There might be a way to implement "resuming" but I think it would be a decent amount of work and involve some caching on disk. I'd rather implement this once people complain and ask for it TBH