OpenNMT-py The content in the pred.txt is repetitive

I use the transformer model to train my chinese dataset.After do the translate.py , the content in the pred.txt is repetitive. The output and the source is not corresponding.

SENT 713: ['特', '斯', '拉', '发', '布', '6.0', '版', '本', '固', '件', '后', '，', '允', '许', '车', '主', '通', '过', '手', '机', '端', 'app', '软', '件', '驾', '车', '，', '不', '用', '随', '身', '携', '带', '钥', '匙', '。', '但', '激', '活', '过', '程', '中', '会', '受', '到', '识', '别', '过', '程', '复', '杂', '，', '手', '机', '信', '号', '不', '稳', '定', '的', '困', '扰', '，', '实', '际', '使', '用', '并', '不', '能', '完', '全', '放', '弃', '传', '统', '钥', '匙', '。（', '分', '享', '自', '@', '电', '动', '邦', '）'] PRED 713: 抢高铁票改下午了！铁路部门增 6 个放票时间点 PRED SCORE: -2.0617

SENT 714: ['在', '长', '安', '逸', '动', 'ev', '的', '发', '布', '会', '上', '，', '逸', '动', '公', '布', '了', '补', '贴', '后', '14.49', '至', '15.99', '万', '元', '的', '补', '贴', '价', '。', '作', '为', '国', '内', '首', '款', '紧', '凑', '型', '三', '厢', '纯', '电', '动', '车', '，', '凭', '借', '着', '2660mm', '的', '轴', '距', '空', '间', '表', '现', '，', '它', '将', '拥', '有', '很', '强', '的', '市', '场', '竞', '争', '力', '。（', '分', '享', '自', '@', '电', '动', '邦', '）'] PRED 714: 抢高铁票改下午了！铁路部门增 6 个放票时间点 PRED SCORE: -1.9775

Mar 07 '20 14:03 wangDong524

Your model is probably overfitting, and only learned to output this sentence.

Mar 07 '20 22:03 francoishernandez

Your dataset might be too tiny. How many lines?

Your tokenizing of Chinese characters by single words will not give good results. Try using HanLP and/or Sentencepiece.

Apr 04 '20 09:04 JOHW85

OpenNMT-py OpenNMT-py copied to clipboard

The content in the pred.txt is repetitive

OpenNMT-py
OpenNMT-py copied to clipboard