vowpal_wabbit icon indicating copy to clipboard operation
vowpal_wabbit copied to clipboard

Incremental Training Best Practice

Open Zongshun96 opened this issue 10 months ago • 1 comments

Description

I believe the ease of incremental training is one highlight of VW. However, the incremental training best practice is not obvious on the documentation page. (Please kindly point me to the right page if there is already one.)

I am looking for answers to a couple of questions here. I am using VW 8.6.1.

Screen Shot 2024-04-22 at 10 11 39 PM
  1. I try to incrementally train my model with new labels and corresponding new features (Most new labels and features do not overlap with the trained data). However, as I add more labels, the model F1-Scores drops significantly. I had to retrain the model using all the data the model had seen to improve the F1 scores. Is this an expected way to do incremental training when introducing new labels? As shown in the fig, no data replay indicates the F1-scores without retraining, and with data replay indicates with retraining.
  2. I was using the csoaa reduction, and the documentation says I should specify the number of labels before training. However, it seems the incremental training step can add new classifiers for the newly introduced labels, as shown by the high F1 scores I got. Is this expected behavior or a bug?

Any feedback is appreciated. Thank you!

Zongshun96 avatar Apr 23 '24 02:04 Zongshun96

W.r.t. (1), I'm not surprised to see that retraining tends to be helpful. Online learning algorithms are, to some extent, designed to forget the past in the process of adapting to the present.

W.r.t. (2), there are two different notions of csoaa: one where you need to specify the label up front and one where you specify a different set of features for each of a variable set of labels. Which do you have in mind? (What are the exact flags?)

JohnLangford avatar May 09 '24 16:05 JohnLangford