composable-sft Language SFT training

Language SFT training

Open sumit-agrwl opened this issue 2 years ago • 2 comments

For language SFT training for english, did you use the entire wikipedia or just a subset of it?

Jul 05 '22 14:07 sumit-agrwl

We used the entire wikipedia, but the length of training was less than a full epoch, so in a sense we used a randomly selected subset.

Jul 07 '22 15:07 AlanAnsell

Was there any rationale behind early stopping, like MLM accuracy or something or it was just random? I am asking because I wanted to know how much training data is enough training data for MLM, especially for high resource languages like en?

Jul 07 '22 17:07 sumit-agrwl

composable-sft composable-sft copied to clipboard

Language SFT training

composable-sft
composable-sft copied to clipboard