Xielton comments

Results 12 comments of


                                            Xielton

为什么使用自己的数据集微调完了，它的回答总是重复我问的？

> ![20230603184321](https://user-images.githubusercontent.com/19886457/243082036-2445a432-9c1e-441a-b776-89dcf08be3ab.png) 请问这个问题你解决了吗，我也遇到了回答总是重复我问的情况

为什么使用自己的数据集微调完了，它的回答总是重复我问的？

> 原因可能是多种的，训练数据过少，超参设置与数据规模不匹配，template设置不合理等等，需要一点点排查我用的数据集是68.9K条数据(44MB），请问超参大致如何设置呀。我是用了另外一个框架去微调，也是出现了和这里一样的问题

Repeated output

> 我尝试在text-generation-webui项目中运行了这个模型，运行的很好，我觉得lmflow的 chat代码应该是有些bug的，基本上对话不超过三句就会出现无法回答或者胡言乱语你好，请问您text-generation-webui项目是哪一个呀，我这边微调llama7b，也遇到和您一样的问题，模型胡乱输出，我不知道是我微调失败的原因，还是chat那个脚本有bug

> > 你这个上面prompt写的###，下面end_string 写的#，这什么意思，到底end_string是#还是### > > 上面是###，下面是#。之前作者说是这样设置的。你好，请问你用的是自己微调后的模型参数吗，我用自己微调后的模型参数根据，你提到的添加end_string，还是会有模型自问自答和胡乱回答的情况，我的finetune的语料格式是 { "type": "text_only", "instances": [ { "text": "Input: Instruction: What is the course code and name for the course on geometric constructions?\n...

Repeated output

> > > > 你这个上面prompt写的###，下面end_string 写的#，这什么意思，到底end_string是#还是### > > > > > > > > > 上面是###，下面是#。之前作者说是这样设置的。 > > > > > > 你好，请问你用的是自己微调后的模型参数吗，我用自己微调后的模型参数根据，你提到的添加end_string，还是会有模型自问自答和胡乱回答的情况，我的finetune的语料格式是 { "type": "text_only", "instances": [ { "text":...

Repeated output

> > > > > > 你这个上面prompt写的###，下面end_string 写的#，这什么意思，到底end_string是#还是### > > > > > > > > > > > > > > > 上面是###，下面是#。之前作者说是这样设置的。 > > > > > >...

Repeated output

> > > > > > > > 你这个上面prompt写的###，下面end_string 写的#，这什么意思，到底end_string是#还是### > > > > > > > > > > > > > > > > > > > >...

Repeated output

> > > ###Human: {input_text}###Assistant: > > ![image](https://user-images.githubusercontent.com/102452590/243936279-c78ff5b2-e587-4907-9cb5-5584f6c8e011.png) > > ![image](https://user-images.githubusercontent.com/102452590/243936606-029510d2-d7e6-49e9-af64-57b58b55bb54.png) > > 我按照你说的设置了end_string以及prompt，比之前好了点，但是多问几个问题，还是出现了重复，但是我用text-generation-webui （在github搜到）来运行这个模型，对话20句也不会出现重复，我觉得chatbot代码还是有点问题的 @shizhediao > > ###Human: {input_text}###Assistant: > > ![image](https://user-images.githubusercontent.com/102452590/243936279-c78ff5b2-e587-4907-9cb5-5584f6c8e011.png) > > ![image](https://user-images.githubusercontent.com/102452590/243936606-029510d2-d7e6-49e9-af64-57b58b55bb54.png) > > 我按照你说的设置了end_string以及prompt，比之前好了点，但是多问几个问题，还是出现了重复，但是我用text-generation-webui （在github搜到）来运行这个模型，对话20句也不会出现重复，我觉得chatbot代码还是有点问题的...

repeat the question

> 检查一下数据流，在训练的时候input和output是否符合预期。也可以试一下full model training，不过lora也不会导致重复问题 > > Please check the data flow to ensure that the input and output are as expected during training. You can also try full model...

model forgetting

> 我使用一个医学问答数据集（[chatdoctor](https://github.com/Kent0n-Li/ChatDoctor)）来微调robin-7b-v2，即使只训练0.01个epoch，也会导致完全失去对话能力。 > > 数据集形式如下： { "type": "text2text", "instances": [ { "input": "###Human: Hi, I have what feels like a lump in my hip felxor, pretty much in the crease...