ERNIE 使用ERNIE\applications\tasks\text_matching python示例代码测试文本相似度得分问题

使用ERNIE\applications\tasks\text_matching python示例代码测试文本相似度得分问题

Open lbz0920 opened this issue 3 years ago • 2 comments

测试数据infer.txt文件中，两段短文本比较，完全不同的两个文本： : run_infer.py:50 * 9640 ('在家电脑做什么兼职好呢\t海尔全自动洗衣机', '[0.22760319709777832, 0.7723968029022217]')，这两个结果字段代表什么意思？有文档说明吗？怎么才能求出文本的相似度得分？

Jun 29 '22 11:06 lbz0920

[0.22760319709777832, 0.7723968029022217] 分别代表两个文本 [不匹配的概率,匹配的概率] pointwise是通过分类的方式来训练匹配任务，pairwise是通过计算正负样本对相似度大小的方式来训练匹配任务，但他们推理的产出都是样本对是否匹配的概率值，不产出两个文本的相似度得分。如果实现要相似度得分可以根据代码https://github.com/PaddlePaddle/ERNIE/blob/94a2367ba7f0f83b48330233450ea095d8dc9382/applications/tasks/text_matching/model/ernie_matching_siamese_pairwise.py#L112 反向计算得到，如此样例相似度为（该方式只适用于pairwise的匹配任务）： 0.7723968029022217*2-1=0.5447936058044434

Jul 05 '22 05:07 webYFDT

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reopen it. Thank you for your contributions.

Sep 20 '22 18:09 stale[bot]

ERNIE ERNIE copied to clipboard

使用ERNIE\applications\tasks\text_matching python示例代码测试文本相似度得分问题

ERNIE
ERNIE copied to clipboard