Raleigh

Results 11 comments of Raleigh

GDPL still could not train, the loss is so big!

Hey, guys, I am using vanilla PPO with original reward to train. However, the evluation (success rate) is not good at all. It could not go higher when it meets...

Y > > Hey, guys, I am using vanilla PPO with original reward to train. However, the evluation (success rate) is not good at all. It could not go higher...

作者在文章也说过,还有self attention 以及 可控制的推荐,在第三章部分。

https://zhuanlan.zhihu.com/p/84526966 人家没抄,在知乎上改进了。

Could you share your code of training the refined embeddings? The one you shared is pretty old.

问一下大家,用INT4的模型来跑的话,在colab上执行一次对话要多久?我这边显示是10秒左右,但是检查了代码,发现GPU显存是有的,5.5G。这个正常吗?

想问一下,一般embedding模型和普通的文本类输出模型是什么关系?是训练一个好的embedding基座,之后的chat 模型可以基于这个embedding模型进行微调吗?

我也是这么想的,文件过于粗糙了,好多地方都看不明白。