LLaMA-Factory reward model 使用do_predict得到的结果和直接用api部署不同

Reminder

[X] I have read the README and searched the existing issues.

System Info

llamafactory version: 0.8.4.dev0
Platform: Linux-5.15.0-88-generic-x86_64-with-glibc2.35
Python version: 3.9.18
PyTorch version: 2.3.0 (GPU)
Transformers version: 4.41.2
Datasets version: 2.18.0
Accelerate version: 0.32.0
PEFT version: 0.12.0
TRL version: 0.9.6
GPU type: NVIDIA A100 80GB PCIe
DeepSpeed version: 0.15.0
vLLM version: 0.5.0

Reproduction

如下两种方式对同一批数据打分结果不一致：方式1：本地部署一个训练过的reward model API_PORT=8001 llamafactory-cli api --model_name_or_path xxx --template qwen --stage rm

通过如下方式获取score

    prompt = "You are a helpful assistant."
    messages = [
        {"role": "system", "content": prompt},
        {"role": "user", "content": instruct},
        {"role": "assistant", "content": output}
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    return text

def get_score(instruct, output):
    text = make_text(instruct, output)
    data = {
                "model": "qwen2.5_3B_style_rm_3k",
                "messages": [
                    text
                ]
            }
    r = requests.post("http://127.0.0.1:8001/v1/score/evaluation", data=json.dumps(data))
    return json.loads(r.text)["scores"][0]```

方式2：
llamafactory-cli train xxx.yaml

yaml内容

model_name_or_path: xxx

stage: rm do_train: false do_eval: false do_predict: true

eval_dataset: xxx template: qwen cutoff_len: 1024 max_samples: 10000 overwrite_cache: true preprocessing_num_workers: 16

output_dir: xxx

per_device_eval_batch_size: 1


### Expected behavior

方式1给出的score比较低，且chosen > reject 的比例只有60%
方式2 给出score 数值较高，且chosen > reject 的比例有100%

想知道是我部署出了问题，还是评测出了问题

### Others

_No response_

Nov 08 '24 12:11 vxfla

你好，我是在本地部署的API，没有对外的接口，这里的调用方式是采用llamafactory文档中的方式启动服务后，结合给出的API文档中的参数自己写的。

将仙 @.***

------------------ 原始邮件 ------------------ 发件人: "hiyouga/LLaMA-Factory" @.>; 发送时间: 2024年11月18日(星期一) 晚上7:47 @.>; @.@.>; 主题: Re: [hiyouga/LLaMA-Factory] reward model 使用do_predict得到的结果和直接用api部署不同 (Issue #5967)

@vxfla 你好，请问一下，我用你的调用api方法404，请问做了什么修改吗，谢谢

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

Nov 18 '24 12:11 vxfla

你好，我是在本地部署的API，没有对外的接口，这里的调用方式是采用llamafactory文档中的方式启动服务后，结合给出的API文档中的参数自己写的。将仙 @.*** … ------------------ 原始邮件 ------------------ 发件人: "hiyouga/LLaMA-Factory" @.>; 发送时间: 2024年11月18日(星期一) 晚上7:47 @.>; @.@.>; 主题: Re: [hiyouga/LLaMA-Factory] reward model 使用do_predict得到的结果和直接用api部署不同 (Issue #5967) @vxfla 你好，请问一下，我用你的调用api方法404，请问做了什么修改吗，谢谢 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

我搞错了，根据你的可以调用成功，我问下reward的分数，越高越说明回答好是不

Nov 18 '24 15:11 world2025

你好，我是在本地部署的API，没有对外的接口，这里的调用方式是采用llamafactory文档中的方式启动服务后，结合给出的API文档中的参数自己写的。将仙 @.*** … ------------------ 原始邮件 ------------------ 发件人: "hiyouga/LLaMA-Factory" @.>; 发送时间: 2024年11月18日(星期一) 晚上7:47 @.>; @.@.>; 主题: Re: [hiyouga/LLaMA-Factory] reward model 使用do_predict得到的结果和直接用api部署不同 (Issue #5967) @vxfla 你好，请问一下，我用你的调用api方法404，请问做了什么修改吗，谢谢 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

兄弟，请问解决了么？

Dec 12 '24 07:12 zh-hike

add_generation_prompt=True @vxfla 是不是这里出了问题，reward应该算在 eot_id上

Dec 12 '24 08:12 namezhenzhang

I found the same problem. Any progress？

Apr 04 '25 09:04 leofbank

same problem! 使用llamafactory-cli api 生成的结果跟 llamafactory-cli train --do_predict True 生成的结果就是不一样，api的更糟糕

Apr 16 '25 06:04 OPilgrim

这个问题解决了嘛？求教

Jun 13 '25 07:06 snowyrain

same problem! 使用llamafactory-cli api 生成的结果跟 llamafactory-cli train --do_predict True 生成的结果就是不一样，api的更糟糕

你说的更糟糕是在训练集抽取的数据做的测试吗

Jun 13 '25 10:06 snowyrain

Reminder

[x] I have read the README and searched the existing issues.

System Info

llamafactory version: 0.8.4.dev0

Platform: Linux-5.15.0-88-generic-x86_64-with-glibc2.35

Python version: 3.9.18

PyTorch version: 2.3.0 (GPU)

Transformers version: 4.41.2

Datasets version: 2.18.0

Accelerate version: 0.32.0

PEFT version: 0.12.0

TRL version: 0.9.6

GPU type: NVIDIA A100 80GB PCIe

DeepSpeed version: 0.15.0

vLLM version: 0.5.0

Reproduction

如下两种方式对同一批数据打分结果不一致：方式1：本地部署一个训练过的reward model API_PORT=8001 llamafactory-cli api --model_name_or_path xxx --template qwen --stage rm

通过如下方式获取score
    prompt = "You are a helpful assistant."
    messages = [
        {"role": "system", "content": prompt},
        {"role": "user", "content": instruct},
        {"role": "assistant", "content": output}
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    return text

def get_score(instruct, output):
    text = make_text(instruct, output)
    data = {
                "model": "qwen2.5_3B_style_rm_3k",
                "messages": [
                    text
                ]
            }
    r = requests.post("http://127.0.0.1:8001/v1/score/evaluation", data=json.dumps(data))
    return json.loads(r.text)["scores"][0]```

方式2：
llamafactory-cli train xxx.yaml

yaml内容
model_name_or_path: xxx

stage: rm do_train: false do_eval: false do_predict: true

eval_dataset: xxx template: qwen cutoff_len: 1024 max_samples: 10000 overwrite_cache: true preprocessing_num_workers: 16

output_dir: xxx

per_device_eval_batch_size: 1
### Expected behavior

方式1给出的score比较低，且chosen > reject 的比例只有60%
方式2 给出score 数值较高，且chosen > reject 的比例有100%

想知道是我部署出了问题，还是评测出了问题

### Others

_No response_

您这个问题解决了吗

Jun 29 '25 03:06 kfchenhn

遇到了同样的问题，请问解决了吗？1. 分数多少才算对？api部署的分数好像都很低 2. message是纯文本，不知道怎么构造这个message，目前是自己拼接上qwen的template的，不知道对不对

Aug 04 '25 12:08 waywayyang