advNLG
advNLG copied to clipboard
JGR中更新generator的reward
您好!感谢您精彩有趣的工作! 在看《JOINT GENERATOR-RANKER LEARNING FOR NATURAL LANGUAGE GENERATION》论文代码时,我发现一个问题:在更新generator计算reward时,reward由两部分相加得到,即reranker_rewards和metric_rewards,但是在将两部分相加前,似乎却没有归一化,self.args.normalize_rewards是False。这样reranker_rewards和metric_rewards便相差了几个数量级,但是论文得出了reranker_rewards更加重要的结论。 不知道我是否遗漏了什么? 感谢您的解惑! 以下代码位于:JGR/trainer_utils/trainer.py的compute_loss_generator函数中
self.reward_tracker['reranker_rewards'].append(reranker_rewards.detach().cpu().numpy().tolist())
self.reward_tracker['metric_rewards'].append(metric_rewards.detach().cpu().numpy().tolist())
if self.args.normalize_rewards: #
rererank_rewards_std = torch.std(reranker_rewards, dim=1, keepdim = True)
metric_rewards_std = torch.std(metric_rewards, dim=1, keepdim = True)
reranker_rewards = reranker_rewards / (rererank_rewards_std + eps)
metric_rewards = metric_rewards / (metric_rewards_std + eps)
rewards = self.args.reranker_reward_scaler * reranker_rewards + self.args.metric_reward_scaler * metric_rewards
rewards = rewards.view(-1) #(B* C)
rewards = rewards.unsqueeze(1).expand_as(generated_probs) #(B*C,L)