Rajkumar Ramamurthy comments

Results 36 comments of


                                            Rajkumar Ramamurthy

trafficstars

BART supervised

@talent404 This is a known issue. We have a PR coming up to fallback to data parallel when model parallel is not available. I will keep you posted.

For self-dialogue training, I think you need to update the following: - [env] (https://github.com/allenai/RL4LMs/blob/main/rl4lms/envs/text_generation/env.py) - This is where you need to update the dynamics. For example, once is reached, add...

Implementing self-play

Yes feel free to contribute :)

Just a warning that the package doesn't work with Transformers 4.25.1

yeah true, if we are to upgrade transformers, we need to update hf_generation_utils.py too.

100% likely that two function parameters have been merged by accident

Oh right. This seems like a merging issue. The good thing is that argument is unused/overridden at the moment. That is why it still works without any run-time errors.

Top-K and Top-p sampling

This top p mask is quite different from typical top-p sampling. This is particular to NLPO algorithm. Before sampling, we generate a top p mask from the mask policy (a...