Linly icon indicating copy to clipboard operation
Linly copied to clipboard

关于平行语料的预处理

Open lyy-zz opened this issue 2 years ago • 3 comments

你好,请问平行语料的预处理有做特殊格式拼接吗? 比如:中英用特殊分隔,拼接成一行 之类的?

lyy-zz avatar May 30 '23 11:05 lyy-zz

同问,预训练的时候需要加prompt吗,比如please translate English to Chinese

mynewstart avatar Aug 18 '23 03:08 mynewstart

没有prompt


发件人: mynewstart @.> 发送时间: Friday, August 18, 2023 11:42:19 AM 收件人: CVI-SZU/Linly @.> 抄送: Subscribed @.***> 主题: Re: [CVI-SZU/Linly] 关于平行语料的预处理 (Issue #93)

同问,预训练的时候需要加prompt吗,比如please translate English to Chinese

― Reply to this email directly, view it on GitHubhttps://github.com/CVI-SZU/Linly/issues/93#issuecomment-1683304654, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE3SPVZTVABTBL342VKGSIDXV3QBXANCNFSM6AAAAAAYT36W6E. You are receiving this because you are subscribed to this thread.Message ID: @.***>

ydli-ai avatar Aug 18 '23 03:08 ydli-ai

那请问平行语料的训练loss是和其他语料一样,每个next token的loss和还是说只是计算英文/中文 token部分的?

mynewstart avatar Aug 21 '23 08:08 mynewstart