mst272
mst272
微调后用代码中的evaluation做humaneval评测时报错Failed to extract code block with error `list index out of range`:
``` >>> Task: Python/2 >>> Output: def truncate_number(number: float) -> float: """ Given a positive floating point number, it can be decomposed into and integer part (largest integer smaller than...
> [rank0]: Traceback (most recent call last): > [rank0]: File "/opt/tmp/nlp/wzh/LLM-Dojo/rlhf/rloo_train.py", line 167, in > [rank0]: trainer.train() > [rank0]: File "/home/nlp/miniconda3/envs/codellm2/lib/python3.9/site-packages/trl/trainer/rloo_trainer.py", line 246, in train > [rank0]: query_response, logits =...
I see that the paper says that the Annotator can be adjusted through prompt. But the implementation of trl is score. Is this different from the paper?