Mr-Potential
Mr-Potential
[code](https://github.com/OpenLMLab/MOSS-RLHF/blob/main/ppo/ppo_datahelper.py#L201)为每个token位置计算GAE时,都需要使用对应位置的reward[t],但是在penalized_rewards计算时,只有最后时刻有加reward,即:penalized_rewards[-1] += rewards[i],而对于其它位置,penalized_rewards就只有KL惩罚了,那是否需要计及这些状态的reward呢
I have also encountered this issue. You can resolve it by adding the specified code [here](https://github.com/mem0ai/mem0/blob/main/mem0/vector_stores/faiss.py#L311C9-L317C94). change: ``` if index_to_delete is not None: self.docstore.pop(vector_id, None) self.index_to_id.pop(index_to_delete, None) self._save() logger.info(f"Deleted vector...
Thank you for the reminder! It does help! In addition to the aforementioned name discrepancies, 'lm_head.weight' also needs to be revised. The complete revisions are as follows, for those who...
@ZSL98 Hi, may I ask if this part could be included in future updates? Thanks!