KAG 请教一下kag抽取实体关系是严格按照schema定义来的吗？

自定义schema,commit后，抽取结果出现了不在schema定义中的实体类型和关系类型

Dec 04 '24 08:12 jerryHo123

参考 https://github.com/OpenSPG/KAG/blob/master/kag/examples/musique/builder/prompt/ner.py 可以在 parse_response 中根据自己的schema过滤掉不符合的类型

Dec 04 '24 08:12 thundax-lyp

参考 https://github.com/OpenSPG/KAG/blob/master/kag/examples/musique/builder/prompt/ner.py 可以在 parse_response 中根据自己的schema过滤掉不符合的类型

这个只能枚举过滤吧，能过滤掉所有不匹配schema的类型吗

Dec 04 '24 09:12 jerryHo123

参考 https://github.com/OpenSPG/KAG/blob/master/kag/examples/musique/builder/prompt/ner.py 可以在 parse_response 中根据自己的schema过滤掉不符合的类型

而且这个是prompt reponse过滤的，我需要的是在图谱构建时就过滤哦

Dec 04 '24 10:12 jerryHo123

    def __init__(
            self, language: Optional[str] = "en", **kwargs
    ):
        super().__init__(language, **kwargs)
        self.schema = SchemaClient(project_id=self.project_id).extract_types()
        self.template = Template(self.template).safe_substitute(schema=self.schema)

self.schema 里是前面提交给openspg的类型

Dec 04 '24 10:12 thundax-lyp

好的，我看看，顺便再请教下，build下的prompt目录以及其下的ner.py,std.py的作用是干啥呀，看源码没找到在哪调用的

Dec 05 '24 03:12 jerryHo123

在/kag/builder/component/extractor/kag_extractor.py中，KAGExtractor.__init__里，通过PromptOp.load()动态加载。

        self.ner_prompt = PromptOp.load(self.biz_scene, "ner")(
            language=self.language, project_id=self.project_id
        )

在 KAGExtractor.invoke里可以看到，先用named_entity_recognition使用ner提取entity，这里提取出的entity的名称可能会模糊不清，然后在 named_entity_standardization里使用std做消歧，提取出消歧后的official_name作为entity的最终名。

同样，在solver中，对问题也会进行提取->消歧的处理。

Dec 05 '24 05:12 thundax-lyp

顺便说下，从提示词里看，KAG使用模型内知识帮助补充了office_name，如果是行业内特殊名词，模型可能会不知道office_name，这时可以通过词典强行替换，也可以用finetune后的行业模型做处理

Dec 05 '24 05:12 thundax-lyp

在/kag/builder/component/extractor/kag_extractor.py中，KAGExtractor.__init__里，通过PromptOp.load()动态加载。
        self.ner_prompt = PromptOp.load(self.biz_scene, "ner")(
            language=self.language, project_id=self.project_id
        )
在 KAGExtractor.invoke里可以看到，先用named_entity_recognition使用ner提取entity，这里提取出的entity的名称可能会模糊不清，然后在 named_entity_standardization里使用std做消歧，提取出消歧后的official_name作为entity的最终名。

同样，在solver中，对问题也会进行提取->消歧的处理。

那如果我新建一个example项目的话，还得再建prompt目录以及自定义ner,std文件来做提取和消歧吧？

Dec 05 '24 06:12 jerryHo123

在/kag/builder/component/extractor/kag_extractor.py中，KAGExtractor.__init__里，通过PromptOp.load()动态加载。
        self.ner_prompt = PromptOp.load(self.biz_scene, "ner")(
            language=self.language, project_id=self.project_id
        )
在 KAGExtractor.invoke里可以看到，先用named_entity_recognition使用ner提取entity，这里提取出的entity的名称可能会模糊不清，然后在 named_entity_standardization里使用std做消歧，提取出消歧后的official_name作为entity的最终名。同样，在solver中，对问题也会进行提取->消歧的处理。
那如果我新建一个example项目的话，还得再建prompt目录以及自定义ner,std文件来做提取和消歧吧？

不需要，一般默认的就够了。默认不满足的情况下，才需要自己重写。也可以不使用extractor，直接从其他KG系统里取关系出来。在 https://github.com/OpenSPG/openspg 的系统结构图里可以看到，builder只是把数据弄进openspg里，至于数据怎么来的，可以根据自己的数据自己定义，最终组合成KGWriter需要的结构推给他就可以了

Dec 05 '24 06:12 thundax-lyp

可以不经过std.py这一步么？

Dec 06 '24 03:12 pecanjk

KAG provide schema-free extraction and schema-constraint extraction , user can chose either one of them.

Jan 14 '25 11:01 caszkgui

在配置文件里有非结构化数据抽取需要选择的extractor，里面就有schema free和schema constraint，第二个是严格按照设置的schema抽取的，这是我的理解。各位大佬，kag.config.yaml文件里面的这个是指cpu的线程吗
num_threads_per_chain: 4 num_chains:

Sep 16 '25 13:09 kuibawansui