amulil
amulil
> LMDeploy hasn't support window attention yet. @lvhan028 Will Lmdeploy support window attention, It seems LongLora used window attention, If I want deploy a LongLora model.Can i use the Lmdeploy?
> lmdeploy hasn't supported window attention yet. I mean If I use a Longlora model, Can I use lmdeploy to deploy the model without using window attention?
meet the same problem @tridao do you have the recommended way to solve it?
> [code](https://github.com/OpenLMLab/MOSS-RLHF/blob/main/ppo/ppo_datahelper.py#L201)为每个token位置计算GAE时,都需要使用对应位置的reward[t],但是在penalized_rewards计算时,只有最后时刻有加reward,即:penalized_rewards[-1] += rewards[i],而对于其它位置,penalized_rewards就只有KL惩罚了,那是否需要计及这些状态的reward呢 @ruizheng20 麻烦解答一下这个问题,我也有这个疑惑,为啥只在 penalized_rewards[-1] + rewards[i] 呢,代码里 rank_all 是 False,如果设置成 True,那不就相当于每个 token 都有一个对应的 reward,和只在 penalized_rewards[-1] 加 reward 不就冲突了吗?