RRHF icon indicating copy to clipboard operation
RRHF copied to clipboard

Runtime error:数据类型报错

Open sqqiao opened this issue 7 months ago • 0 comments

作者好,我在复现RRHF时碰到变量类型报错: 我配置fsdp_config进行分布式训练,当我使用--bf16混合精度时,报错: return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got CUDABFloat16Type instead (while checking arguments for embedding)

如果不使用bf16和tf32,报错: return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.FloatTensor instead (while checking arguments for embedding)

我的fsdp_config配置如图 1

使用的模型是llama3-8b,或者是tokenizer需要重新配置一下吗?

sqqiao avatar Jul 23 '24 03:07 sqqiao