Anima 关于rlhf中source_max_length和target_max

关于rlhf中source_max_length和target_max_len

Open jiahuanluo opened this issue 11 months ago • 3 comments

在 qlora_dpo.py中，看到对chosen 进行 max_length=self.source_max_len 的tokenize，对rejected进行max_length=self.target_max_len的tokenize，为什么呢？ https://github.com/lyogavin/Anima/blob/dc691b2958f50a6d73a239b0e13c341ce6b2d60f/rlhf/qlora_dpo.py#L491 我们以为source_max_len是指instruction + query 的lenth，target_max_len是response的length

Jul 06 '23 10:07 jiahuanluo

Anima Anima copied to clipboard

关于rlhf中source_max_length和target_max_len

Anima
Anima copied to clipboard