starcoder Effect of FIM on StarCoder pre-training

Effect of FIM on StarCoder pre-training

Open gojkoc54 opened this issue 2 years ago • 2 comments

trafficstars

Hi!

Curious to know some more details about FIM and its effect on the pre-trained model. Here's a paragraph from the SantaCoder paper:

FIM for cheap We observe a minor drop in performance of the FIM model compared to the No-FIM model. Specifically, we see that the pass@100 performance of the FIM model is 2-4% lower on HumanEval and 1% lower on MBPP. While Bavarian et al. (2022) presented evidence for the existence of a FIM-for-free property (i.e., arguing that autoregressive models can be trained with FIM without harming left-to-right capabilities), we do find a small but consistent drop of FIM models on left-to-right text2code benchmarks.

Was a similar analysis carried out on StarCoder?
Was StarCoder pre-trained on a 50-50 split between FIM and next-token data? (as indicated in this Megatron script)

Sep 06 '23 11:09 gojkoc54

Hello, we didn't perform the ablation for StarCoder given the amount of compute it requires for training, but you can check the CodeLLama paper where the authors observed similar behavior at different scales.

Regarding FIM percentage, we used 50%.

Nov 15 '23 15:11 loubnabnl

Hello, we didn't perform the ablation for StarCoder given the amount of compute it requires for training, but you can check the CodeLLama paper where the authors observed similar behavior at different scales.

Regarding FIM percentage, we used 50%.

i have a question, as the known ratio, many eval ratios drop because of fim under pretrain stage, why you still use fim with 50% percentage?

Apr 03 '24 01:04 yiyepiaoling0715

starcoder starcoder copied to clipboard

Effect of FIM on StarCoder pre-training

starcoder
starcoder copied to clipboard