LoL-RL icon indicating copy to clipboard operation
LoL-RL copied to clipboard

Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients

Results 2 LoL-RL issues
Sort by recently updated
recently updated
newest added

While running: **python lolrl_qlora_llama_hh.py --sampling_strategy good_priority** logs with error msg like below: [2024-03-19 18:59:01,658] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) Traceback (most recent call last): File "path/to/LoL-RL/lolrl_qlora_llama_hh.py", line...

Hi, thanks for releasing the codebase, it's really helpful. It seems that i am unable to import utils, for example, `from utils import save_in_jsonl, distinctness, load_from_pickle`in data_cleaning.py, `save_in_jsonl, distinctness, load_from_pickle`...