vowpal_wabbit
vowpal_wabbit copied to clipboard
Avg loss of new model is not less than the avg loss of the old model even when the old model has not seen new data points.
Hi,
The initial model is trained as /tmp/vw --data /tmp/data.dat -f /tmp/cb.model --hash all --bit_precision 28 -k --passes 5 --onethread -q CH -q CA -q HA -q RA --cb_explore_adf --cb_type dr --save_resume --holdout_off --cache_file .
Then the new model is trained as /tmp/vw --data /tmp/data.dat -i /tmp/cb.model -f /tmp/cb_new.model --hash all --bit_precision 28 -k --passes 5 --onethread -q CH -q CA -q HA -q RA --cb_explore_adf --cb_type dr --save_resume --holdout_off --power_t 0 --cache_file.
Only if the avg loss of the new model is less than the initial model on the test data, do we update the old model with the new model.( https://vowpalwabbit.org/docs/vowpal_wabbit/python/latest/tutorials/off_policy_evaluation.html)
--power_t 0 is used so that the learning rate is not decreased while retraining the old model.
We are finding that the old model is not getting replaced by the new model which should not happen as the model is seeing some of the data which are new to the model.
I checked the learning rate which is 0.5. The output of training new model is as below learning rate = 0.5 initial_t = 0 power_t = 0 decay_learning_rate = 1
I feel I am missing something here? Can anybody help?
@JohnLangford @jackgerrits
What is your evaluation method here? With holdout_off and multiple passes, it looks like you are just using (essentially) the training performance, which of course is subject to overfitting. In that context, one effect which can occur is that adding data reduces overfitting which may increase the training loss / decrease the training reward.
@JohnLangford
For evaluation, I am splitting the production data (generated using the old model policy) into the train_ data(80%) and test_data (20%). old_model_loss_on_test_data = avg cost of the test_data.
I retrain the old model with the train_data and then evaluate it using the test_data. The avg loss while evaluating the new model on the test_data is compared with the old_model_loss_on_test_data.
As I am using test_data separately, so I pass --holdout_off in the command line.
The other obvious possibility is that your data source is nonstationary. Is that plausible? In particular, if you permute your training events and repeat the experiments, what happens?
@JohnLangford I shuffled the train data and repeated the experiments. But the avg test loss remains the same.
Nonstationary is plausible as the distribution of the cost(features, action) may change over time. To address this issue, we have taken --power_t 0 as mentioned in https://vowpalwabbit.org/docs/vowpal_wabbit/python/latest/tutorials/cmd_linear_regression.html.
Is the new model's perf worse than the old model's on the test set when both old and new are trained on permuted data?
If so, it's interesting. If not, it's just saying that non-stationarity is plausibly the culprit.
@JohnLangford It looks like nonstationarity is creating the problem. We are monitoring it. I will update after a week.
@musram Is this issue resolved by identifying the non-stationarity?
@jackgerrits it's resolved.