gpt2-ml-finetune- 可以给一个输入数据的例子吗，就是 pre

May 21 '20 03:05 EiraZhang

只要是文本格式的数据都行，文章、书籍啥的都行，但最好处理下，比如删除一些url之类，尽量让数据干净些

May 21 '20 04:05 wind91725

如果finetuning ，需要喂给它句子对呢？需要用什么做分隔？

Jun 12 '20 09:06 huangdacheng

直接自然语言文本就行，书籍，文章都可以，不需要做任何操作。另外，推荐另一个中文预训练模型给您 https://github.com/lipiji/Guyu ，这个模型小了很多，也很方便调优

在 2020-06-12 17:34:21，"huangdacheng" [email protected] 写道：

如果finetuning ，需要喂给它句子对呢？需要用什么做分隔？

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

Jun 12 '20 10:06 wind91725

如果需要喂给它的是一问一答，也不需要做分隔吗？

直接自然语言文本就行，书籍，文章都可以，不需要做任何操作。另外，推荐另一个中文预训练模型给您 https://github.com/lipiji/Guyu ，这个模型小了很多，也很方便调优在 2020-06-12 17:34:21，"huangdacheng" [email protected] 写道：如果finetuning ，需要喂给它句子对呢？需要用什么做分隔？ — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

Jun 12 '20 10:06 huangdacheng

直接自然语言文本就行，书籍，文章都可以，不需要做任何操作。另外，推荐另一个中文预训练模型给您 https://github.com/lipiji/Guyu ，这个模型小了很多，也很方便调优在 2020-06-12 17:34:21，"huangdacheng" [email protected] 写道：如果finetuning ，需要喂给它句子对呢？需要用什么做分隔？ — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

谢谢您的推荐，我也看看，Thank you

Jun 12 '20 10:06 huangdacheng

问答数据每条数据最好弄成 quesion&answer\n 这种形式中间用个特殊字符隔开然后预测的时候以quesion&输入

在 2020-06-12 18:05:18，"huangdacheng" [email protected] 写道：

如果需要喂给它的是一问一答，也不需要做分隔吗？

直接自然语言文本就行，书籍，文章都可以，不需要做任何操作。另外，推荐另一个中文预训练模型给您 https://github.com/lipiji/Guyu ，这个模型小了很多，也很方便调优在 2020-06-12 17:34:21，"huangdacheng" [email protected] 写道：如果finetuning ，需要喂给它句子对呢？需要用什么做分隔？ — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

Jun 12 '20 10:06 wind91725

直接自然语言文本就行，书籍，文章都可以，不需要做任何操作。另外，推荐另一个中文预训练模型给您 https://github.com/lipiji/Guyu ，这个模型小了很多，也很方便调优在 2020-06-12 17:34:21，"huangdacheng" [email protected] 写道：如果finetuning ，需要喂给它句子对呢？需要用什么做分隔？ — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

Guyu 的代码在最后生成句子的时候，使用的概率sample，这种方式好吗？

Jun 16 '20 07:06 huangdacheng

可以的 Guyu的生成策略有两个一个是topk 一个是topp 这两个就是gpt2常用的两种生成策略

在 2020-06-16 15:23:10，"huangdacheng" [email protected] 写道：

直接自然语言文本就行，书籍，文章都可以，不需要做任何操作。另外，推荐另一个中文预训练模型给您 https://github.com/lipiji/Guyu ，这个模型小了很多，也很方便调优在 2020-06-12 17:34:21，"huangdacheng" [email protected] 写道：如果finetuning ，需要喂给它句子对呢？需要用什么做分隔？ — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

Guyu 的代码在最后生成句子的时候，使用的概率sample，这种方式好吗？

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

Jun 17 '20 01:06 wind91725

直接自然语言文本就行，书籍，文章都可以，不需要做任何操作。另外，推荐另一个中文预训练模型给您 https://github.com/lipiji/Guyu ，这个模型小了很多，也很方便调优在 2020-06-12 17:34:21，"huangdacheng" [email protected] 写道：如果finetuning ，需要喂给它句子对呢？需要用什么做分隔？ — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

您好，我看Guyu代码里的的训练，只是预测X 句子最后一个字的？

May 23 '21 10:05 huangdacheng

gpt2-ml-finetune-
gpt2-ml-finetune- copied to clipboard

可以给一个输入数据的例子吗，就是 pre_data.py的输入文件格式

gpt2-ml-finetune- gpt2-ml-finetune- copied to clipboard

可以给一个输入数据的例子吗，就是 pre_data.py的输入文件格式

gpt2-ml-finetune-
gpt2-ml-finetune- copied to clipboard