InstructionWild
InstructionWild copied to clipboard
Structure of the training data
Hey guys, I have two questions regarding the structure of your data set. In the ReadMe you say that you are using the "Alpaca" approach, where they use a triple of instruction, input, output. Why are you prepending the concrete instruction at the end and not at the beginning? I couldn't find any samples where you provide a triplet with instruction, input and output. What is the reason for that? More concretely you are using the sample_seed.jsonl to create instructions where do you store the input and output?