Le Zhang
Le Zhang
same issue here
这应该不是一个bug, 最后用asycio重写了奖励函数的实现就快了
Does this require implement generate_inner in the clent model by ourselfs?
Also there is llava-pretrain and sbu558k both exist in the aligment data, I wonder the difference between them.
Thanks, that makes sense then.