vowpal_wabbit icon indicating copy to clipboard operation
vowpal_wabbit copied to clipboard

feat: [py] Providing Basic support for Tensorboard and Tensorwatch

Open speedhunter001 opened this issue 4 years ago • 4 comments

Usage

  • Created VWtoTensorboard, VWtoTensorwatchStreamer, VWtoTensorwatchClient, DFtoVWtoTBorTW classes in python/vowpalwabbit/DFtoVWtoTBorTW.py
  • Also created an example (.ipynb) file in python/examples/iris-df-to-vw-to-tensorboard-tensorwatch
  • Edited pyvw.py to add a method .get_label and modified vw.learn to support tensorboard and tensorwatch logs
  • Added Iris.csv to python/examples/iris-df-to-vw-to-tensorboard-tensorboard
  • Purpose of DFtoVWtoTBorTW is to use this class to create tensorboard or tensorwatch logs or both for vw metrics out of a pandas df, DFtoVW and vw objects
  • Purpose of VWtoTensorboard is to support Tensorboard logs for vw metrics: average_loss, since_last, graph of vw reductions, arguments as text for now
  • Purpose of VWtoTensorwatchStreamer and VWtoTensorwatchClient is to support Tensorwatch logs for vw metrics: average_loss, since_last, label, prediction for now
df_simple = pd.DataFrame({
        "f1": [1, 2, 3],
        "f2": [1.2, 2.3, 3.4],
        "f3": [0.2, 0.8, .01],
        "l": [1, 2, 3],
})

label = MulticlassLabel(label="l")
features = [Feature(col) for col in ["f1", "f2", "f3"]]

df_to_vw = DFtoVW(df=df_simple, label=label, features=features)
vw = pyvw.vw('--oaa 3 -P 1')

# For tensorboard
logdir = './logs'
vw_to_tb = VWtoTensorboard(logdir)

# For tensorwatch
logfile = './iris.log'
vw_to_tw = VWtoTensorwatchStreamer(logfile)

df_to_tb_tw = DFtoVWtoTBorTW(df_to_vw.convert_df(), vw)
df_to_tb_tw.fit(vw_to_tensorboard=vw_to_tb, vw_to_tensorwatch=vw_to_tw)  

vw_to_tb.draw_reductions_graph(vw)
vw_to_tb.show_args_as_text(vw)

vw_to_tw_client = VWtoTensorwatchClient(logfile)
vw_to_tw_client.plot_metrics()

# Both vw_to_tb and vw_to_tw objects can be passed to a single DFtoVWtoTBorTW instance

Tensorboard logs will be created as VWtoTensorboard instance is passed to .fit method Tensorwatch logs will be created as VWtoTensorwatchStreamer instance is passed to .fit method

To run tensorboard on logs that were created, type command tensorboard --logdir ./logs in the directory where logs for tensorboard are present

For tensorwatch visualization use .plot_metrics of Tensorwatch Client as above

Implementation Details

  1. Create a DFtoVW object using pandas Dataframe object
  2. Create a vw object, specify -P 1 too whle initializing object
  3. Create a VWtoTensorboard instance, or VWtoTensorwatchStreamer instance or both
  4. Create DFtoVWtoTBorTW object by passing examples from DFtoVW and vw objects as parameters
  5. Call .fit() method of DFtoVWtoTBorTW (tensorboard and tensorwatch argument is optional) for the vw object to start learning, and to output Tensorboard logs (if VWtoTensorboard instance is passed, otherwise default value is None), and to also output Tensorwatch logs (if VWtoTensorwatchStreamer instance is passed, otherwise default value is None)

VWtoTensorboard is used to log metrics for tensrboard .get_label() method is also added to pyvw.py before pyvw.get_prediction implementation and vw.learn has been modified to support Tensorboard logs for average_loss and since_last, graph of vw reductions, arguments as text and Tensorwatch logs for average_loss, since_last, label, prediction

Limitations

  • Curretly its only for vw objects where -P 1 is specified
  • Currently no .predict() method for predicting on data set in DFtoVWtoTensorboard
  • Methods for these seem to be not in vw currenlty, so couldn't accomodate them in pyvw.get_label() - 5: lMAX - 6: lCONDITIONAL_CONTEXTUAL_BANDIT - 7: lSLATES - 8: lCONTINUOUS
  • Tensorboard logs for metrics: label, prediction are not created as line plot for them is not suitable inside Tensorboard but barplot support is available for Tensorwatch so they are supported in Tensorwatch
  • self.finish_example is not called in vw.learn if an example instance is passed as parameter, calling vw.learn(ec, vw_to_tb) with ec as example instance, vw_to_tb as VWtoTensorboard instance would give an error as metrics are not calculated correctly
  • logs of label, prediction for Tensorwatch is only done when string ec object is passed

speedhunter001 avatar May 28 '21 07:05 speedhunter001

This pull request introduces 1 alert when merging 2c8642eb77200ad552002acd170c0745c9bb23cf into 6b45001e89ed9b732fee0288d046c155f1c69e6d - view on LGTM.com

new alerts:

  • 1 for Unused import

lgtm-com[bot] avatar Jun 12 '21 22:06 lgtm-com[bot]

It seems like we could reasonably implement this externally to the core library? Adding a Tensorboard dependency seems pretty heavy if it can be avoided

jackgerrits avatar Jan 27 '22 21:01 jackgerrits

Hey @jackgerrits - could you expand a bit on how you think it would be layered? Are you thinking of creating a proxy object that drives the underlying vw object? Or were you thinking more along the lines of expecting a specific callback interface into the normal learning code, like we do now, just without specifically typing it, then implementing the TW/TB projections externally?

lokitoth avatar Feb 10 '22 15:02 lokitoth

I am primarily referring to a sort of proxy object. It seems like the information required for this module to operate is available "externally" to the individual predict and learn calls. Therefore, providing a separate, and importantly optional, module which wraps the necessary predict and learn calls seems to be a more flexible design. Not needing to change pyvw.py at all seems doable to me and is the best outcome I think.

The biggest factor here is that adding tensorboardX and tensorwatch dependencies to the core vowpalwabbit library is not a viable path forward.

jackgerrits avatar Feb 18 '22 15:02 jackgerrits

Going to go ahead and close this. If you would like to revisit in future please feel free to reopen.

jackgerrits avatar Feb 29 '24 16:02 jackgerrits