BertWithPretrained issues

是否可以实现单机多卡训练，我在修改代码时候，出现以下问题

1

Traceback (most recent call last): File "/home/yons/workfiles/codes/opencodes/BertWithPretrained/Tasks/TaskForChineseNER.py", line 315, in train(config) File "/home/yons/workfiles/codes/opencodes/BertWithPretrained/Tasks/TaskForChineseNER.py", line 132, in train loss, logits = model(input_ids=token_ids, # [src_len, batch_size] File "/home/yons/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl...

DemonDamon

关于MLM pretraining时，做句子对Classfication的咨询？

1

您好，想请教下句子对Pretraining，我看了Task/TaskForPretraining.py，是 MLM和NSP的组合任务，受到启发想咨询下，如果做句子对分类（即判断句子a和句子b是否属于同一类），是不是相应的调整一下句子对的处理（即模型输入token_type_ids改为[0] * (len(token_a_ids) + 2) + [1] * (len(token_b_ids) + 1)），用句子对label替换 nsp_label即可？还是说有其他的方法？

done520

env

你好torch==1.5.0最低要求py3.7 3.6的环境下无法安装torch==1.5.0

bluuuu21m

songci数据集，wiki2预训练时会报错，生成的掩码pt文件wiki_train_mlNone_rs2022_mr15_mtr8_mtur5.pt只有1k

4

## 注意，正在使用本地MyTransformer中的MyMultiHeadAttention实现 [2022-11-27 15:03:35] - INFO: ## 使用token embedding中的权重矩阵作为输出层的权重！torch.Size([30522, 768]) [2022-11-27 15:03:38] - INFO: 缓存文件 /home/********/博一/my_explore/BERT_learn/BertWithPretrained-main/data/WikiText/wiki_test_mlNone_rs2022_mr15_mtr8_mtur5.pt 不存在，重新处理并缓存！ ## 正在读取原始数据: 100%|██████████████| 4358/4358 [00:00

Phil-521

关于从头训练MLM tasks任务的咨询。

1

你好，感谢您提供的代码！关于预训练，我有一个问题想咨询一下。您提供的TaskForPretraining.py，实际上是从一个训练好的模型上进一步pretrain。如果我想完全从随机初始化开始进行pretrain，相关学习策略是否需要调整。例如初始学习率，衰减策略等等

ljjcoder

关于attention_mask

1

请问为什么attention_mask 有效的token是false，padding的token是True呢？

liu-xu20

请问一下，在TaskForPretraining任务中，使用我自己数据集生成的pt文件太大了，怎么办呀？大佬

70557dzqc

TaskForSQuADQuestionAnswering训练任务时报错IndexError: list index out of range

1

在执行TaskForSQuADQuestionAnswering训练任务时，经常遇到这样的报错，请问是什么原因导致的？正在遍历每个问题（样本）: 76%|████████████▉ | 16/21 [05:20

xuechaofei

训练TaskForChineseNER.py任务，改变self.entities的数量，报错

1

![QQ截图20221012111148](https://user-images.githubusercontent.com/65322222/195241168-300edc43-4d96-471f-b1d7-cea474735485.png) 原代码中entities数量为7，现改为自己的数据集，entities数量为23报错，请问如何解决

yfangZhang

BertWithPretrained
BertWithPretrained copied to clipboard

Metadata

是否可以实现单机多卡训练，我在修改代码时候，出现以下问题

关于MLM pretraining时，做句子对Classfication的咨询？

env

songci数据集，wiki2预训练时会报错，生成的掩码pt文件wiki_train_mlNone_rs2022_mr15_mtr8_mtur5.pt只有1k

关于从头训练MLM tasks任务的咨询。

关于attention_mask

请问一下，在TaskForPretraining任务中，使用我自己数据集生成的pt文件太大了，怎么办呀？大佬

TaskForSQuADQuestionAnswering训练任务时报错IndexError: list index out of range

训练TaskForChineseNER.py任务，改变self.entities的数量，报错

← Metadata

Owner

Metadata

BertWithPretrained BertWithPretrained copied to clipboard

Metadata

← Metadata

Owner

Metadata

BertWithPretrained
BertWithPretrained copied to clipboard