RL4LMs icon indicating copy to clipboard operation
RL4LMs copied to clipboard

Reproducing existing results on NarrativeQA

Open yxk23 opened this issue 11 months ago • 0 comments

I'm trying to reproduce the results for NarrativeQA by directly running the command with the .yml configuration files. Below are the performances measured with ROUGE-L-Max. For PPO with supervision, I got 0.581 and 0.588 for epochs 0 and 99, respectively. For NLPO with supervision, I got 0.217 and 0.213 for epochs 0 and 99, respectively.

I'm wondering why the result for NLPO doesn't match the reported result in the paper.

I also tried to use the config for PPO, and just modify the RL algorithm to NLPO, I got the same result as above.

Please let me know if I'm missing something or if it's some other issue. Thanks!

yxk23 avatar Jul 07 '23 16:07 yxk23