wenjunyang
wenjunyang
### 描述 RankHingeLoss的forward方法计算逻辑是:先计算负样本的cosine均值,再调用margin_ranking_loss计算loss。这样是不合理的,因为均值的margin_ranking_loss不等于margin_ranking_loss的均值。我尝试更改后,在模型效果上有很大提升。 ### fix后的参考代码 def forward(self, y_pred: torch.Tensor, y_true: torch.Tensor): """ Calculate rank hinge loss. :param y_pred: Predicted result. :param y_true: Label. :return: Hinge loss computed by user-defined margin....
> ## 📌 Checklist before creating the PR > * [x] I have created an issue for this PR for traceability > * [x] The title follows the standard format:...
## 📌 Checklist before creating the PR - [x] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A concise...
In training_step of PPOTrainer, the `advantage` and `penalty` are computed by the old Actor during making experience. while the ValueLoss computed with clamp. i think they are incorrect.here is the...