albert_pytorch icon indicating copy to clipboard operation
albert_pytorch copied to clipboard

sentence-order prediction

Open qiunlp opened this issue 5 years ago • 4 comments

BERT的NextSentencePr任务过于简单。ALBERT中,为了只保留一致性任务去除主题识别的影响,提出了一个新的任务 sentence-order prediction(SOP) 请问: 这个任务在您程序的哪个部分?

qiunlp avatar Feb 21 '20 15:02 qiunlp

@yzgdjqwh 博文里面有介绍,对应可以找到代码https://lonepatient.top/2019/10/20/ALBERT.html

lonePatient avatar Feb 21 '20 15:02 lonePatient

在您博文中看到了下面,它在您程序的哪个模块?我初学,请多包涵

NSP:是否下一句预测, true = 上下相邻的2个句子,false=随机2个句子

SOP:句间连贯预测,true=正常顺序的2个相邻句子,false=调换顺序的2个相邻句子

if random.random() < 0.5: # 交换一下tokens_a和tokens_b is_random_next = True temp = tokens_a tokens_a = tokens_b tokens_b = temp else: is_random_next = False

qiunlp avatar Feb 21 '20 15:02 qiunlp

@yzgdjqwh 在prepare_lm_data_ngram.py里面

lonePatient avatar Feb 21 '20 15:02 lonePatient

谢谢您及时解答我的菜鸟问题。 我想直接调用您预训练好的model去做下游英语句子顺序判断,请问是下载哪个链接? 我用下面的代码调用您在网上提供的model,效果与人为判断不一. 请赐教。

from pytorch_pretrained_bert import BertTokenizer
from model.modeling_albert import AlbertConfig, AlbertForNextSentencePrediction
加载词典  生成模型标准输入格式,和bert相同
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
input_ids = convert_examples_to_feature(text_a, text_b,  tokenizer)
tokens_tensor = torch.tensor([input_ids]) 
#加载模型
config = AlbertConfig.from_pretrained("./prev_trained_model/albert_base/config.json")
model =AlbertForNextSentencePrediction.from_pretrained("./prev_trained_model/albert_base/pytorch_model.bin",config=config)
model.eval()
#预测
out = model(tokens_tensor)
seq_relationship_scores = out[0]
sample = seq_relationship_scores.detach().numpy()
pred = np.argmax(sample, axis=1)
print(pred)

qiunlp avatar Feb 22 '20 03:02 qiunlp