HanLP icon indicating copy to clipboard operation
HanLP copied to clipboard

amr部分node缺失anchors信息、label错误

Open SoaringTiger opened this issue 3 years ago • 2 comments

Describe the bug amr模型: MRP2020_AMR_ZHO_MENGZI_BASE

例1 输入 ["我", "不", "吃饭"] 执行结果中, 吃饭 对应的 "anchors": []

{
        "id": "0",
        "input": "我 不 吃饭",
        "nodes": [
            {
                "id": 0,
                "label": "我",
                "anchors": [
                    {
                        "from": 0,
                        "to": 1
                    }
                ]
            },
            {
                "id": 1,
                "label": "-",
                "anchors": [
                    {
                        "from": 2,
                        "to": 3
                    }
                ]
            },
            {
                "id": 2,
                "label": "吃饭-01",
                "anchors": []
            }
        ],
        "edges": [
            {
                "source": 2,
                "target": 1,
                "label": "polarity"
            },
            {
                "source": 2,
                "target": 0,
                "label": "arg0"
            }
        ],
        "tops": [
            2
        ],
        "framework": "amr"
    }

例2, 输入 ["我", "吃饭"] 吃饭 对应的 "label": "死-01", "anchors": []

{
        "id": "0",
        "input": "我 吃饭",
        "nodes": [
            {
                "id": 0,
                "label": "我",
                "anchors": [
                    {
                        "from": 0,
                        "to": 1
                    }
                ]
            },
            {
                "id": 1,
                "label": "死-01",
                "anchors": []
            }
        ],
        "edges": [
            {
                "source": 1,
                "target": 0,
                "label": "arg0"
            }
        ],
        "tops": [
            1
        ],
        "framework": "amr"
    }

Code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem.

Describe the current behavior 上述例子在线版测试时label没有问题

Expected behavior A clear and concise description of what you expected to happen.

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
  • Python version: 3.9
  • HanLP version: 2.1b27

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

  • [x] I've completed this form and searched the web for solutions.

SoaringTiger avatar May 17 '22 12:05 SoaringTiger

目前发现:加一个标点,则结果正常 输入: ["我", "吃饭", "。"] 输出:

{
        "id": "0",
        "input": "我 吃饭 。",
        "nodes": [
            {
                "id": 0,
                "label": "我",
                "anchors": [
                    {
                        "from": 0,
                        "to": 1
                    }
                ]
            },
            {
                "id": 1,
                "label": "吃饭-01",
                "anchors": [
                    {
                        "from": 2,
                        "to": 4
                    }
                ]
            }
        ],
        "edges": [
            {
                "source": 1,
                "target": 0,
                "label": "arg0"
            }
        ],
        "tops": [
            1
        ],
        "framework": "amr"
    }

SoaringTiger avatar May 18 '22 11:05 SoaringTiger

有意思。CAMR语料库可能还是太formal了。

hankcs avatar May 19 '22 02:05 hankcs