HanLP amr解析部分数字出错

trafficstars

Describe the bug 例1：我给了他15万元。 amr 解析结果如下图: bug “15万” 未被正确解析

例2: 我给了他十五点八万元。 bug2

“十五点八万” 未被正确解析

例3: 我给了他十元三角八分钱。屏幕快照 2022-04-15 下午5 54 34 “十元三角八分” 未被正确解析

Code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem.

Describe the current behavior 将“15万”改为“十五万”后，可解析为 “150000” 错误应出自数字转换的过程。可以参考 https://github.com/microsoft/Recognizers-Text

Expected behavior 能正确显示 label。当然了，输出数据里的 anchors 标记了原文位置，所以问题也不是特别的大😄

看了下输出的数据，anchors是保留了原文的位置，所以问题也不是特别的大。

System information

OS Platform and Distribution (Linux Ubuntu 16.04):
Python version: 3.9
HanLP version: 2.1b23

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

[x] I've completed this form and searched the web for solutions.

Apr 15 '22 10:04 SoaringTiger

感谢反馈，的确存在中文数字解析的问题。微软的东西试过了，也没法处理一些混合小数与单位的情况，还是得靠自己改了改。请应用补丁：

 pip3 install perin_parser -U

Apr 15 '22 16:04 hankcs

至于部分数值缺失，则是由于模型没有预测出来，而不是预测出来转换错误导致的。暂时没有太好的办法，可能需要跟NER做联合学习。

Apr 15 '22 16:04 hankcs

期待

May 08 '22 13:05 cmdares

1652064727(1)

复制的官网的demo，testutility 一直报错是什么原因？换了几个版本 1.8 .3 1.7.7 1.7.6 1.5.4 都报错

May 09 '22 02:05 tangYiQun

HanLP HanLP copied to clipboard

amr解析部分数字出错

HanLP
HanLP copied to clipboard