Raleigh
Raleigh
GDPL still could not train, the loss is so big!
Hey, guys, I am using vanilla PPO with original reward to train. However, the evluation (success rate) is not good at all. It could not go higher when it meets...
Y > > Hey, guys, I am using vanilla PPO with original reward to train. However, the evluation (success rate) is not good at all. It could not go higher...
作者在文章也说过,还有self attention 以及 可控制的推荐,在第三章部分。
https://zhuanlan.zhihu.com/p/84526966 人家没抄,在知乎上改进了。
+1
Could you share your code of training the refined embeddings? The one you shared is pretty old.
问一下大家,用INT4的模型来跑的话,在colab上执行一次对话要多久?我这边显示是10秒左右,但是检查了代码,发现GPU显存是有的,5.5G。这个正常吗?
想问一下,一般embedding模型和普通的文本类输出模型是什么关系?是训练一个好的embedding基座,之后的chat 模型可以基于这个embedding模型进行微调吗?
我也是这么想的,文件过于粗糙了,好多地方都看不明白。