Zhilin
Results
2
comments of
Zhilin
> > > 1.DeepSeek-R1-Distill-Qwen-32B 的模版是qwen吧 2.8卡3090 可能显存不够 > > > > > > 1、template: deepseek3,模板是deepseek3 2、8*24G,采用z3-offload,还是不够吗?我不确定需要多大的GPU。还请大神赐教。 [@jienimi](https://github.com/jienimi) > > backbone是qwen,当然是用的qwen template,和deepseek无关 我确认了下是deepseek3的模板,虽然用的是qwen的backbone, 但是他们用的自己的模板sft
The same issue. I think it is indeed a bug. code ```python loss_mask = batch.pop("loss_mask")[:, :-1].reshape(-1).to(self.device_name) ``` should be correct.