starcoder2 icon indicating copy to clipboard operation
starcoder2 copied to clipboard

Can starcoder2 be trained with a different language like TCL or lisp?

Open cmosguy opened this issue 1 year ago • 1 comments
trafficstars

Hello @loubnabnl is it possible to get starcoder2 to learn TCL?

It was not part of the 30 languages so was curious if it's worth pursuing with SFT?

Also, is there FIM script you used for this version of starcoder2?

cmosguy avatar Mar 05 '24 02:03 cmosguy

Hi, the 15B model was trained on 600+ programming languages including TCL, here's the full list of languages: https://huggingface.co/datasets/bigcode/the-stack-v2/blob/main/language_stats.csv

The 7B and 3B though were only trained on 17 languages available in the paper

For FIM it's similar to StarCoder, you can use this code with the right tokens (they're different from SantaCoder, we use underscores instead of dashes)

loubnabnl avatar Mar 05 '24 16:03 loubnabnl