vowpal_wabbit icon indicating copy to clipboard operation
vowpal_wabbit copied to clipboard

Allow the user to pass a separate file for validation (rather than using heldout parameter)

Open andreacimino opened this issue 3 years ago • 2 comments

Short description

Allow vw command line to accept a heldout input file during training rather than sampling the training data.

How this suggestion will help you/others

Suppose you have a training data file with labels ordered by time. In some case, datapoint related to the same user can be followed in each example. This makes the validation process useless during training, since there is high chance that the same user samples "fall" in the heldout data. By allowing the user to specify an external file in the heldout process, the user could have much control over the training process.

andreacimino avatar Feb 11 '22 05:02 andreacimino

@JohnLangford correct me if I'm wrong but I believe the standard way to achieve this would be to:

  1. Train and save a model with your input file using --holdout_off and --final_regressor out.vw
  2. Load this model and process your test set in test only mode --initial_regressor out.vw --testonly

jackgerrits avatar Feb 11 '22 13:02 jackgerrits

Thanks for the answer,

The point is that I would like to have information regarding the performance achieved during training (in a multipass stage) rather than at the of the training. Seems that the option --save_per_pass allows me to achieve what I need, since I can test the performance on the testset on each pass of the model.

andreacimino avatar Feb 11 '22 14:02 andreacimino

Closing this at it is currently not on our roadmap, feel free to reopen

olgavrou avatar Sep 22 '22 15:09 olgavrou