ChatGLM-6B icon indicating copy to clipboard operation
ChatGLM-6B copied to clipboard

数据集结构?

Open sanwei111 opened this issue 2 years ago • 4 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

想问下数据集格式应该要怎么设置? 比如我想构建qa问答数据集: { "q": "xxx", "a": "yyy" } 这样就可以了吗?另外如果我要输出多行的答案呢,比如输出代码这种多行的结构,qa问答对怎么设置

Expected Behavior

五个

Steps To Reproduce

no

Environment

- OS:linux
- Python:3.8
- Transformers:4.27
- PyTorch:2.xxx
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :11.6

Anything else?

no

sanwei111 avatar May 22 '23 02:05 sanwei111

代码段我猜测是把"\n \t"这类的符号也加进去

roki1031 avatar May 23 '23 08:05 roki1031

数据集就qa就行了?要不要给个prompt的形式,比如三元组那种:prompt,input,output

sanwei111 avatar May 24 '23 06:05 sanwei111

请问兄弟现在找到构建的方法了吗?

sunshinesDL avatar May 27 '23 08:05 sunshinesDL

{ "q": "xxx", "a": "yyy" } { "q": "xxx", "a": "yyy" } 训练需要在sh里面配置一下

FrankXuFromCN avatar May 31 '23 05:05 FrankXuFromCN