Rajkumar Ramamurthy
Rajkumar Ramamurthy
@talent404 This is a known issue. We have a PR coming up to fallback to data parallel when model parallel is not available. I will keep you posted.
For self-dialogue training, I think you need to update the following: - [env] (https://github.com/allenai/RL4LMs/blob/main/rl4lms/envs/text_generation/env.py) - This is where you need to update the dynamics. For example, once is reached, add...
Yes feel free to contribute :)
yeah true, if we are to upgrade transformers, we need to update hf_generation_utils.py too.
Oh right. This seems like a merging issue. The good thing is that argument is unused/overridden at the moment. That is why it still works without any run-time errors.
This top p mask is quite different from typical top-p sampling. This is particular to NLPO algorithm. Before sampling, we generate a top p mask from the mask policy (a...