starcoder2
starcoder2 copied to clipboard
Can starcoder2 be trained with a different language like TCL or lisp?
trafficstars
Hello @loubnabnl is it possible to get starcoder2 to learn TCL?
It was not part of the 30 languages so was curious if it's worth pursuing with SFT?
Also, is there FIM script you used for this version of starcoder2?
Hi, the 15B model was trained on 600+ programming languages including TCL, here's the full list of languages: https://huggingface.co/datasets/bigcode/the-stack-v2/blob/main/language_stats.csv
The 7B and 3B though were only trained on 17 languages available in the paper
For FIM it's similar to StarCoder, you can use this code with the right tokens (they're different from SantaCoder, we use underscores instead of dashes)