Pleaee @showgood880702 i need this dataset as text2text .json file to understand the structure correctly

Open antonious-emad opened this issue 9 months ago • 1 comments

{

"input": "###Instruction: ....\n\n###human: ....\n\n###chatbot: ....\n\n###human: ....\n\n###chatbot: ....\n\n###human: .....\n\n###chatbot:",

"output": ".....###"

}

Thank you very much for the explanation.

I am still a little confused about the training data structure for a chatbot. For example, here I have a multi-round conversation used as training data. Should I feed it to the model as I showed before, with the end_mark and the end?

{"input": "###Instruction: ....\n\n###human: ....\n\n###:chatbot:", "output": ".....###"}

{"input": "###Instruction: ....\n\n###human: ....\n\n###:chatbot:", "output": ".....###"}

{"input": "###Instruction: ....\n\n###human: ....\n\n###:chatbot:", "output": ".....###"}

or should I split them as pairs of and as different instances, and start with the instruction?

Originally posted by @showgood880702 in #357

Apr 01 '25 11:04 antonious-emad

HI @showgood880702 if it applicable to share withe your dataset in the structured format of text2text that will help me really I need your help please 🙏

Apr 01 '25 21:04 antonious-emad