ChatGLM-6B
ChatGLM-6B copied to clipboard
数据集结构?
Is there an existing issue for this?
- [X] I have searched the existing issues
Current Behavior
想问下数据集格式应该要怎么设置? 比如我想构建qa问答数据集: { "q": "xxx", "a": "yyy" } 这样就可以了吗?另外如果我要输出多行的答案呢,比如输出代码这种多行的结构,qa问答对怎么设置
Expected Behavior
五个
Steps To Reproduce
no
Environment
- OS:linux
- Python:3.8
- Transformers:4.27
- PyTorch:2.xxx
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :11.6
Anything else?
no
代码段我猜测是把"\n \t"这类的符号也加进去
数据集就qa就行了?要不要给个prompt的形式,比如三元组那种:prompt,input,output
请问兄弟现在找到构建的方法了吗?
{ "q": "xxx", "a": "yyy" } { "q": "xxx", "a": "yyy" } 训练需要在sh里面配置一下