RL4LMs icon indicating copy to clipboard operation
RL4LMs copied to clipboard

A modular RL library to fine-tune language models to human preferences

Results 48 RL4LMs issues
Sort by recently updated
recently updated
newest added

Hey, First of all thank you for this amazing repo! I am trying to employ this repo with a model that is does not have the parallelize() function (led -...

Hello I would like to implement self-play dialogue training. For that I guess I need to modify episode rollout process by adding formatting like speaker id on the start of...

I have tried using BART as a seq2seq type model, from huggingface facebook/bart-large. This howerver throws an error saying that .parallelise doesnt exit. Has anyone been able to finetune bart...

Logging with the root logger, like `logging.info`, removes the possibility of controlling the log level of submodules separately. `logging.getLogger(__name__)` enables this (& is the recommended practice), by doing something like...

Looks like generation_beam_constraints doesn't exist or has been moved?

Has this library been tested with larger models such as GPT-J-6B and GPT-NeoX-20B? Are there plans to support larger models like these? Thanks.

https://github.com/allenai/RL4LMs/blob/main/rl4lms/envs/text_generation/policy/seq2seq_policy.py#L263 ![Screen Shot 2022-11-29 at 6 05 29 PM](https://user-images.githubusercontent.com/3231217/204667819-409cb407-726f-40d9-9d43-8eb0ef9617f5.png)

good first issue
code enhancement

Hi, thanks for your great work! I have a question about the sampling process. When both top-K and top-p are enabled (e.g., https://github.com/allenai/RL4LMs/blob/main/scripts/training/task_configs/common_gen/t5_nlpo.yml#L44-L51), isn't top-p just ignored because the K...