magicoder
magicoder copied to clipboard
A scaling law of instruction-code-data would be very interesting...
Just started reproducing Magicoder and could not help wondering, would a bigger OSS-Instruct dataset work better and how much better? PS. There are 12,000,000 files in Python inside bigcode/Starcoderdata, with only 40K/12M being used.
Cool! Yes, I agree it'd be very interesting to see how it goes when the dataset scales
Cool! Yes, I agree it'd be very interesting to see how it goes when the dataset scales
So...would you guys do such work soon?
Yeah we are doing some relevant follow-ups. Stay tuned for future updates:)