RRHF
RRHF copied to clipboard
[NIPS2023] RRHF & Wombat
Hi, this is a nice work. I have some questions regarding Results in **Comparison based on Vicuna test set** section shown in README. How score A and score B are...
This is good job. However, we always use BPRLoss rather than HingeLoss in pairwise learning to rank since the margin of HingeLoss is hard to tune. So I wonder whther...
The idea of this paper is really great and much easier to understand than ppo. However, if there are six candidate responses, then at least batch size should be equal...
作者好,我在复现RRHF时碰到变量类型报错: 我配置fsdp_config进行分布式训练,当我使用--bf16混合精度时,报错: return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got CUDABFloat16Type instead (while...
感谢作者的工作,请问是否可以分享RRHF-Online Sampling的相关代码,想做一下复现实验
Hi, thanks for your great work! I would like to point out a potential bug in this code: add_special_tokens without checking embedding size is very dangerous especially for llama. In...