CORL
CORL copied to clipboard
Finetune algorithms log only train regret
All of the algorithms with offline-to-online finetuning log training regret (regret obtained by online interactions which are used for training) under both train/regret
and eval/regret
. So we report only train regret which is different from Cal-QL work where authors report eval regret. Reporting eval regret is strange because the thing we really want to minimize on practice is a train regret so this bug is not critical but should be kept in mind. I will fix it but without reruning all of the algorithms due to compute limitations (maybe later we will rerun it).