trl
trl copied to clipboard
update in DPO raise several problems...
-
get_hh was removed. There is no dataset now.
-
from trl.commands.cli_utils import DpoScriptArguments, init_zero_verbose, TrlParser: No module named 'trl.commands'
Those changes are currently only on main, did you install TRL from source?
Would you like to give #1456 a try?
python examples/scripts/dpo.py \
--dataset_name=trl-internal-testing/hh-rlhf-trl-style \
--model_name_or_path=gpt2 \
--per_device_train_batch_size 4 \
--max_steps 1000 \
--learning_rate 1e-3 \
--gradient_accumulation_steps 1 \
--logging_steps 10 \
--eval_steps 500 \
--output_dir="dpo_anthropic_hh" \
--warmup_steps 150 \
--report_to wandb \
--bf16 \
--logging_first_step \
--no_remove_unused_columns
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
trl v0.13.0
from trl.commands.cli_utils import TrlParser
error:
ModuleNotFoundError: No module named 'trl.commands'
Solution:
from trl.scripts.utils import TrlParser