Raleigh comments

Results 11 comments of


                                            Raleigh

[Maintenance] RL policy training

GDPL still could not train, the loss is so big!

[Maintenance] RL policy training

Hey, guys, I am using vanilla PPO with original reward to train. However, the evluation (success rate) is not good at all. It could not go higher when it meets...

[Maintenance] RL policy training

Y > > Hey, guys, I am using vanilla PPO with original reward to train. However, the evluation (success rate) is not good at all. It could not go higher...

模型问题

作者在文章也说过，还有self attention 以及可控制的推荐，在第三章部分。

兄弟,你俩谁抄袭的谁?

https://zhuanlan.zhihu.com/p/84526966 人家没抄，在知乎上改进了。

关于训练数据问题

How to obtain the refined embeddings?

Could you share your code of training the refined embeddings? The one you shared is pretty old.

[BUG/Help] <title>能否给一个可以跑通的colab部署脚本

问一下大家，用INT4的模型来跑的话，在colab上执行一次对话要多久？我这边显示是10秒左右，但是检查了代码，发现GPU显存是有的，5.5G。这个正常吗？

是否有embeddings模型释放出来

想问一下，一般embedding模型和普通的文本类输出模型是什么关系？是训练一个好的embedding基座，之后的chat 模型可以基于这个embedding模型进行微调吗？

作者可以提供一份详细的readme文件吗？

我也是这么想的，文件过于粗糙了，好多地方都看不明白。