cchenv
cchenv
Hi @shayne-longpre thanks for labeling all the licenses in the Flan Collection! I'm a bit confused about the Flan-T5 models' Apache-2.0 license, i.e., if some datasets in the Flan Collection...
Hi @cinjon did you figure it out? It's confusing. Also the actual batch size seems to be 256 (2 * 8 * 16), so there should be about 232 steps...
@cinjon I tried to use TRL's implementation (https://huggingface.co/docs/trl/cpo_trainer#simple-preference-optimization-simpo) for training runs, but I cannot reproduce their Gemma2-9B-it-SimPO model. The resulting model after 1 epoch on the dataset is so much...
Hi @yumeng5 thanks for adding your implementation to TRL! I just have a few quick questions. 1. The existing implementation uses CPOTrainer for training SimPO models. However, by using this...