GDPL still could not train, the loss is so big!
Hey, guys, I am using vanilla PPO with original reward to train. However, the evluation (success rate) is not good at all. It could not go higher when it meets...
作者在文章也说过,还有self attention 以及 可控制的推荐,在第三章部分。
https://zhuanlan.zhihu.com/p/84526966 人家没抄,在知乎上改进了。
Could you share your code of training the refined embeddings? The one you shared is pretty old.
想问一下,一般embedding模型和普通的文本类输出模型是什么关系?是训练一个好的embedding基座,之后的chat 模型可以基于这个embedding模型进行微调吗?