Keyword-BERT issues

Questions about forecast results: test_results.tsv

hello @DataTerminatorX Questions about forecast results:test_results.tsv 1. What does generation mean by probability? Choose the one with high probability? 2. Why does the test data have a high probability or...

KangChou

modified extract_features.py did not uploaded? it is the same as original Bert implement, and I cannot find keyword extractor in the repo

2

dongxiaohuang

关键词抽取的一些疑惑

2

本文的work感觉很大质量上依赖于关键词抽取的质量。而在实际的业务中，并不能保证许多场景业务的关键词都能很好的抽取出来，导致实用性降低了一层。针对各种关键词算法抽取对于整体模型的影响有仔细对比过吗

Vincent131499

数据

1

您好，我想请问一下，在运行模型的时候构造数据时有create_pretraining_data.py和convert_to_bert_keyword.py两个文件，它们的作用分别是什么哪？然后在create_pretraining_data.py中数据输入输出的格式是什么那？

zhx970928

关键词系统

2

关键词系统相关代码可以提供一下吗？（数据预处理之类的）

kakaxisisan

code error: run_squad.py ---->read_baike_examples(input_file, is_training)

``` def read_baike_examples(input_file, is_training): """Read a baike txt file into a list of SquadExample""" with tf.gfile.Open(input_file, "r") as reader: for line in reader: ``` ![image](https://user-images.githubusercontent.com/36963108/188535365-9da470b4-9db9-47f1-a839-87cc1b226a4f.png)

KangChou

正负样本的比例

4

请问正负样本的比例有什么建议吗，谢谢

kscp123

代码中语法问题确认

1

你好，文件convert_to_bert_keyword.py文件中的match接口，有如下两点疑惑： 1、在调英文匹配的时候调用的仍是中文匹配接口 def match(s, kws): kw_index = set() for kw in kws: if re.match(r'^[\u4e00-\u9fff]+$', kw): kw_index |= set(match_ch(s, kw)) elif re.match(r'^[a-zA-Z]+$', kw): kw_index |= set(match_ch(s, kw)) #我的理解这里应该是用来做英文匹配的 else: continue return...

EvelynZhaoShiMei

关于模型结构和 kw_mask

6

## 1. 模型结构看论文中的描述，关键字注意力层和常规 transformer 层分别接在 11 层常规 transformer 之后，但是看源码中，貌似并不是这样，也就是 modeling.py 的第 212、226 行，类似于一个双塔结构，它们共享的只有 embedding 层？ ## 2. kw_mask attention 在生成这个 mask 的过程中，cls 和 sep 三行中如果不经过特殊处理应该在进入 softmax 之前全部被填充成 -10000，那这三行在进行 softmax...

HuipengXu

关于kw_mask部分

2

首先谢谢作者开放代码~ 看了您的源码，发现create_attention_mask_from_keyword_mask函数生成的kw_mask只有A's token到B's key word的映射，按这部分的注释似乎是包含A到B以及B到A两方的token 及 kw映射，代码似乎搞错了

Htter

Keyword-BERT
Keyword-BERT copied to clipboard

Metadata

Questions about forecast results: test_results.tsv

modified extract_features.py did not uploaded? it is the same as original Bert implement, and I cannot find keyword extractor in the repo

关键词抽取的一些疑惑

数据

关键词系统

code error: run_squad.py ---->read_baike_examples(input_file, is_training)

正负样本的比例

代码中语法问题确认

关于模型结构和 kw_mask

关于kw_mask部分

← Metadata

Owner

Metadata

Keyword-BERT Keyword-BERT copied to clipboard

Metadata

← Metadata

Owner

Metadata

Keyword-BERT
Keyword-BERT copied to clipboard