vowpal_wabbit Avg loss of new model is not less than the avg loss of the old model even when the old model has not seen new data points.

Hi,

The initial model is trained as /tmp/vw --data /tmp/data.dat -f /tmp/cb.model --hash all --bit_precision 28 -k --passes 5 --onethread -q CH -q CA -q HA -q RA --cb_explore_adf --cb_type dr --save_resume --holdout_off --cache_file .

Then the new model is trained as /tmp/vw --data /tmp/data.dat -i /tmp/cb.model -f /tmp/cb_new.model --hash all --bit_precision 28 -k --passes 5 --onethread -q CH -q CA -q HA -q RA --cb_explore_adf --cb_type dr --save_resume --holdout_off --power_t 0 --cache_file.

Only if the avg loss of the new model is less than the initial model on the test data, do we update the old model with the new model.( https://vowpalwabbit.org/docs/vowpal_wabbit/python/latest/tutorials/off_policy_evaluation.html)

--power_t 0 is used so that the learning rate is not decreased while retraining the old model.

We are finding that the old model is not getting replaced by the new model which should not happen as the model is seeing some of the data which are new to the model.

I checked the learning rate which is 0.5. The output of training new model is as below learning rate = 0.5 initial_t = 0 power_t = 0 decay_learning_rate = 1

I feel I am missing something here? Can anybody help?

@JohnLangford @jackgerrits

Jun 16 '22 14:06 musram

What is your evaluation method here? With holdout_off and multiple passes, it looks like you are just using (essentially) the training performance, which of course is subject to overfitting. In that context, one effect which can occur is that adding data reduces overfitting which may increase the training loss / decrease the training reward.

Jun 16 '22 15:06 JohnLangford

@JohnLangford

For evaluation, I am splitting the production data (generated using the old model policy) into the train_ data(80%) and test_data (20%). old_model_loss_on_test_data = avg cost of the test_data.

I retrain the old model with the train_data and then evaluate it using the test_data. The avg loss while evaluating the new model on the test_data is compared with the old_model_loss_on_test_data.

As I am using test_data separately, so I pass --holdout_off in the command line.

Jun 16 '22 17:06 musram

The other obvious possibility is that your data source is nonstationary. Is that plausible? In particular, if you permute your training events and repeat the experiments, what happens?

Jun 16 '22 18:06 JohnLangford

@JohnLangford I shuffled the train data and repeated the experiments. But the avg test loss remains the same.

Nonstationary is plausible as the distribution of the cost(features, action) may change over time. To address this issue, we have taken --power_t 0 as mentioned in https://vowpalwabbit.org/docs/vowpal_wabbit/python/latest/tutorials/cmd_linear_regression.html.

Jun 17 '22 11:06 musram

Is the new model's perf worse than the old model's on the test set when both old and new are trained on permuted data?

If so, it's interesting. If not, it's just saying that non-stationarity is plausibly the culprit.

Jun 17 '22 15:06 JohnLangford

@JohnLangford It looks like nonstationarity is creating the problem. We are monitoring it. I will update after a week.

Jun 21 '22 14:06 musram

@musram Is this issue resolved by identifying the non-stationarity?

Aug 22 '22 14:08 jackgerrits

@jackgerrits it's resolved.

Aug 23 '22 07:08 musram

vowpal_wabbit vowpal_wabbit copied to clipboard

Avg loss of new model is not less than the avg loss of the old model even when the old model has not seen new data points.

vowpal_wabbit
vowpal_wabbit copied to clipboard