RL4LMs icon indicating copy to clipboard operation
RL4LMs copied to clipboard

Top-K and Top-p sampling

Open boblee22 opened this issue 3 years ago • 1 comments
trafficstars

Hi, thanks for your great work!

I have a question about the sampling process. When both top-K and top-p are enabled (e.g., https://github.com/allenai/RL4LMs/blob/main/scripts/training/task_configs/common_gen/t5_nlpo.yml#L44-L51), isn't top-p just ignored because the K most likely next words are filtered and the probability mass is redistributed among only those K next words? Please correct me if my understanding is wrong. Thank you!

boblee22 avatar Oct 19 '22 04:10 boblee22

This top p mask is quite different from typical top-p sampling. This is particular to NLPO algorithm. Before sampling, we generate a top p mask from the mask policy (a copy of policy from previous epochs). Depending on generation kwargs, top k is applied on top of this. For details, you can refer to our paper.

rajcscw avatar Oct 22 '22 08:10 rajcscw