LoL-RL
LoL-RL copied to clipboard

Published 20 hours ago •

→

Metadata

Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients

Reame
Issues

Results 2 LoL-RL issues

Sort by recently updated

No module named 'utils.data_utils'

1

While running: **python lolrl_qlora_llama_hh.py --sampling_strategy good_priority** logs with error msg like below: [2024-03-19 18:59:01,658] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) Traceback (most recent call last): File "path/to/LoL-RL/lolrl_qlora_llama_hh.py", line...

popoala

unable to import utils

3

Hi, thanks for releasing the codebase, it's really helpful. It seems that i am unable to import utils, for example, `from utils import save_in_jsonl, distinctness, load_from_pickle`in data_cleaning.py, `save_in_jsonl, distinctness, load_from_pickle`...

JiuhaiChen

About

Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients

reinforcement-learning

natural-language-processing

language-model

policy-gradient

23

Stars

6

Forks

Watchers

Owner

abaheti95

← Metadata

23

Stars

6

Forks

Watchers

Owner

abaheti95

Metadata

Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients

Back

LoL-RL LoL-RL copied to clipboard

Metadata

No module named 'utils.data_utils'

unable to import utils

← Metadata

Owner

Metadata

LoL-RL
LoL-RL copied to clipboard