RL4LMs
RL4LMs copied to clipboard
A modular RL library to fine-tune language models to human preferences
Hey, First of all thank you for this amazing repo! I am trying to employ this repo with a model that is does not have the parallelize() function (led -...
Hello I would like to implement self-play dialogue training. For that I guess I need to modify episode rollout process by adding formatting like speaker id on the start of...
I have tried using BART as a seq2seq type model, from huggingface facebook/bart-large. This howerver throws an error saying that .parallelise doesnt exit. Has anyone been able to finetune bart...
Logging with the root logger, like `logging.info`, removes the possibility of controlling the log level of submodules separately. `logging.getLogger(__name__)` enables this (& is the recommended practice), by doing something like...
Looks like generation_beam_constraints doesn't exist or has been moved?
Has this library been tested with larger models such as GPT-J-6B and GPT-NeoX-20B? Are there plans to support larger models like these? Thanks.
https://github.com/allenai/RL4LMs/blob/main/rl4lms/envs/text_generation/policy/seq2seq_policy.py#L263 ![Screen Shot 2022-11-29 at 6 05 29 PM](https://user-images.githubusercontent.com/3231217/204667819-409cb407-726f-40d9-9d43-8eb0ef9617f5.png)
Hi, thanks for your great work! I have a question about the sampling process. When both top-K and top-p are enabled (e.g., https://github.com/allenai/RL4LMs/blob/main/scripts/training/task_configs/common_gen/t5_nlpo.yml#L44-L51), isn't top-p just ignored because the K...