magicoder icon indicating copy to clipboard operation
magicoder copied to clipboard

A scaling law of instruction-code-data would be very interesting...

Open yucc-leon opened this issue 1 year ago • 3 comments

Just started reproducing Magicoder and could not help wondering, would a bigger OSS-Instruct dataset work better and how much better? PS. There are 12,000,000 files in Python inside bigcode/Starcoderdata, with only 40K/12M being used.

yucc-leon avatar Feb 23 '24 07:02 yucc-leon

Cool! Yes, I agree it'd be very interesting to see how it goes when the dataset scales

UniverseFly avatar Feb 26 '24 19:02 UniverseFly

Cool! Yes, I agree it'd be very interesting to see how it goes when the dataset scales

So...would you guys do such work soon?

yucc-leon avatar Feb 28 '24 10:02 yucc-leon

Yeah we are doing some relevant follow-ups. Stay tuned for future updates:)

UniverseFly avatar Feb 28 '24 13:02 UniverseFly