songbird icon indicating copy to clipboard operation
songbird copied to clipboard

Tensorboard and qiime2

Open mortonjt opened this issue 5 years ago • 10 comments

At the moment, tensorboard is disabled when running the qiime2 songbird command. The original motivation was because an additional logging directory needs to be created in order to enable this.

I'm not sure how qiime2 is expected to handle logging directories - should they be typed?

Thoughts @fedarko @ebolyen?

mortonjt avatar Sep 17 '19 16:09 mortonjt

Is the goal to keep this directory around? Creating a well managed temporary directory is just fine from QIIME 2's perspective (dumping things into the current working directory or specific locations is usually where we take issue).

ebolyen avatar Sep 17 '19 17:09 ebolyen

Right, the goal would be to create a logging directory that can be viewed through Tensorboard after the run is completed. This can help users debug their runs and compare have different runs differ (i.e. compare and contrast different covariates, priors, ...).

PR #71 currently creates this logging dir in the current directory (it has been disabled previously).

mortonjt avatar Sep 17 '19 18:09 mortonjt

I see! We don't have an equivalent concept, but @thermokarst and I have been chatting recently about "diagnostic" outputs. This could help inform that design. From poking around, it looks like tensorboard is a server, and probably requests data from the logging directory as needed. So it doesn't fit into a full visualization on the surface.

ebolyen avatar Sep 17 '19 18:09 ebolyen

Does #71 (just writing to a logging directory, ignoring the standard notion of "output" in QIIME 2) have any actual downsides, aside from maybe not being the most elegant solution? Using future functionality baked in to QIIME 2 to handle this would be super nice, but IMO what's really important for now is making these diagnostics available to the user—to my understanding, this is pretty much required to make running Songbird (or mmvec?) useful, and while I can't speak on mmvec this isn't available when running qiime songbird right now.

One quick way to avoid the problem of dumping stuff into the current working directory (sort of :) might be adding a required --p-diagnostic-directory parameter (ideally specified relative to the CWD), and then having tensorflow output diagnostics to there. I recognize that this doesn't really fit with the current paradigm of having QIIME 2 take care of the I/O, but my vote would be strongly for a temporary solution like that instead of letting the problem sit.

fedarko avatar Sep 17 '19 20:09 fedarko

In lack of a better framework-based solution. Would it make sense to only create this "diagnostic directory" under the current directory if the user activates the --verbose flag? The directory could be automatically named using the parameters for the model, and if --verbose is not used then no directory is created.

On (Sep-17-19| 9:05), Jamie Morton wrote:

At the moment, tensorboard is disabled when running the qiime2 songbird command. The original motivation was because an additional logging directory needs to be created in order to enable this.

I'm not sure how qiime2 is expected to handle logging directories - should they be typed?

Thoughts @fedarko @ebolyen?

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/biocore/songbird/issues/70

ElDeveloper avatar Sep 17 '19 20:09 ElDeveloper

RE @fedarko / @ElDeveloper , that's basically what the standalone CLI is currently doing - if the user specifies a logging directory, it'll save Tensorboard files there. Otherwise, it'll automatically create a folding named by timestamp.

The main reason why I didn't push this into the q2 CLI this seems to go against q2 standards - all inputs / outputs need to be modularized in some shape or form.

Another work around is to have another output type (i.e. tensorboard type) that can be unzipped and visualized into Tensorboard. The diagnostics is just one big json - so it shouldn't be too big of an issue to just have a type for that.

Then it'll raise the question, where will this type live? Will this be in main q2_types? Or should this be in an auxiliary repo? This will be something that all packages using pytorch / Tensorflow will likely rely on.

mortonjt avatar Sep 17 '19 20:09 mortonjt

The diagnostics is just one big json - so it shouldn't be too big of an issue to just have a type for that.

Wait, really? In my experience running Songbird outside of QIIME 2, I always got a folder with 5 files (not counting the output differentials), and none of these files are JSON. (Attaching screenshot of what this looks like. The events.out.tfevent... file is by far the biggest file here, weighing in at 992 MB.) Screen Shot 2019-09-17 at 1 51 16 PM

If saving tensorboard stuff without doing it through Q2 is really really really not kosher, then an alternative would be not following through with #63 and instead improving the qiime songbird summarize-single command to make its visualization a reasonable-enough approximation of tensorboard (at minimum, adding like a sentence next to each plot that says "this is what this plot represents"—would probably be possible to straight-up copy this from the README's FAQ). This doesn't have to be fancy, but it would enable basic diagnostics in a Q2-compliant way.

Regardless of which solution people want (just writing TF results to a logging dir, making TF output a Q2 type somehow, improving the current summary command to produce a visualization with adequate labels and context), I'm happy to help with the corresponding implementation. Just let me know.

fedarko avatar Sep 17 '19 21:09 fedarko

I misremembered the actual format -- that can be found here.

But yes, the events.out* contains all of the logging information -- the other files are for checkpointing (which we often don't need to use in songbird -- that's only for jobs that takes days to run).

Regarding your comment on the summary commands - that was the initial purpose. Do you have specific issues that you ran into? Feel free to raise issues so that we can start patching them.

mortonjt avatar Sep 17 '19 21:09 mortonjt

If we can hammer out #72 satisfactorily, then I think kicking the can down the road on getting Tensorboard + Q2 more tightly integrated is ok (...I suppose it's not really my call anyway :). Thanks all for the discussion, and sorry for the rants.

fedarko avatar Sep 18 '19 00:09 fedarko

So I looked more closely at how tensorboard works, and while its not beyond the realm of possibility to static-ify the program (see https://github.com/tensorflow/tensorboard/issues/2045; there is a jupyter client, which expects a backend, but like described in the issue, we have a way to "cache" those requests in QZVs if we can change the host for the embedded client). That seems like a lot of work and is kind of brittle.

Pragmatically I am fine with a --p-diagnostic-dir or --p-tensorboard-log however supersystems (like Qiita) would probably prefer that their permanent filesystem is not be arbitrarily changed, which is why we so strongly discourage that kind of thing.

I am of course all for having nicer visualizations as well (I might recommend Vega here). Also, there's no reason we couldn't have our cake and eat it too, we could put the log directory into the visualization, so that savvy users can still find it, it's still attached to provenance, and a sufficiently advanced visualization can read directly from it in an ad-hoc way.

ebolyen avatar Sep 18 '19 15:09 ebolyen