CORL Finetune algorithms log only train regret

Finetune algorithms log only train regret

Open DT6A opened this issue 1 year ago • 0 comments

All of the algorithms with offline-to-online finetuning log training regret (regret obtained by online interactions which are used for training) under both train/regret and eval/regret. So we report only train regret which is different from Cal-QL work where authors report eval regret. Reporting eval regret is strange because the thing we really want to minimize on practice is a train regret so this bug is not critical but should be kept in mind. I will fix it but without reruning all of the algorithms due to compute limitations (maybe later we will rerun it).

Aug 01 '23 12:08 DT6A

CORL CORL copied to clipboard

Finetune algorithms log only train regret

CORL
CORL copied to clipboard