baize-chatbot icon indicating copy to clipboard operation
baize-chatbot copied to clipboard

train my own data, answers are not very accurate.

Open yfq512 opened this issue 2 years ago • 1 comments

I collect some chinese data about "中国云南" like this: 0417-2 And train follow the readme base on Baize-7B, cost 48 hours, get checkpoints finally. when I use this checkpoints to run app.py. The AI can speak Chinese, but sometimes it mixes English and Russian, and the answers are not very accurate. 0417

Can you help me analyze the cause? Is it a lack of training data? Or the base model is an English model? Or something else? thanks

yfq512 avatar Apr 17 '23 02:04 yfq512

Are you using LLaMA as the foundation model? If so, as LLaMA has no Chinese pretraining data, it's not very surprising that the outcome isn't very good. BLOOMZ may be a better foundation model.

JetRunner avatar Apr 18 '23 06:04 JetRunner