WizardLM icon indicating copy to clipboard operation
WizardLM copied to clipboard

78k evolved code instructions

Open ahong007007 opened this issue 1 year ago • 10 comments

Hi WizardCoder team,

image

Is the dataset( 78k evolved code instructions) available for download? Thank you very much

ahong007007 avatar Jun 15 '23 08:06 ahong007007

+1 to this

mtisz avatar Jun 15 '23 21:06 mtisz

+1 to this

li-xl avatar Jun 16 '23 11:06 li-xl

+1 to this

CyberTimon avatar Jun 17 '23 17:06 CyberTimon

Thank you for your wonderful work. The paper introduces a data volume of 78k, of which 20K comes from Alpaca. Where does the other instruction data come from?

ahong007007 avatar Jun 18 '23 14:06 ahong007007

+1, feels Evo-Instruct is even more useful for Code Generation than Normal Chat

Symbolk avatar Jun 19 '23 08:06 Symbolk

+1

fredi-python avatar Jun 19 '23 09:06 fredi-python

+1 to this

BlackBearBiscuit avatar Jun 20 '23 09:06 BlackBearBiscuit

+1

GanjinZero avatar Jun 22 '23 15:06 GanjinZero

+1

sri-hk avatar Jun 26 '23 07:06 sri-hk

+1

fuanan avatar Jul 05 '23 09:07 fuanan

I have created an open source version of the dataset. It took me 120,000 API calls over 3 days. The major caveat here is that I didn't do much post-processing as they didn't explain their process in the paper. So this uncleaned version of my dataset may not have the same performance as the paper. Feel free to use the following (It's also on HF Hub):

https://github.com/nickrosh/evol-teacher

nickrosh avatar Jul 11 '23 04:07 nickrosh

@nickrosh Legend! Thank you.

mtisz avatar Jul 11 '23 16:07 mtisz