Flat-Lattice-Transformer
Flat-Lattice-Transformer copied to clipboard
数据处理部分的代码的clip没生效
在32G 的卡上跑,clip 默认设置的200。 batch-size=1都会溢出。也是服了。 然后发现preprocess里面的clip 其实没有clip掉。最长的句子竟然会出现1146个char 溢出应该也正常吧。。。 修改了代码,贴上来,有遇到同样问题的人可以参考一下。
import argparse
import numpy as np
import os
import sys
sys.path.append('../')
from paths import *
parser = argparse.ArgumentParser()
parser.add_argument('--clip_msra', default=True, action='store_true')
parser.add_argument('--clip_size', default=200, help='soft clip your sequence!')
parser.add_argument('--max_seq_len', default=300,
help='When more than this number, skip!')
parser.add_argument('--train_set', default='train_bio')
parser.add_argument('--test_set', default='test_bio')
args = parser.parse_args()
segment_split = [',', '!', '.', '。', '!', '……', '?', '?', ',', '...']
lexicon_f = open(yangjie_rich_pretrain_word_path, 'r')
char_f = open(yangjie_rich_pretrain_unigram_path, 'r')
output_f = open(yangjie_rich_pretrain_char_and_word_path, 'w')
lexicon_lines = lexicon_f.readlines()
for l in lexicon_lines:
l_split = l.strip().split()
if len(l_split[0]) != 1:
print(l.strip(), file=output_f)
char_lines = char_f.readlines()
for l in char_lines:
print(l.strip(), file=output_f)
def need_clip(now_segment, current_line):
if len(current_line) <= 1:
return True
if len(now_segment) >= args.clip_size and (
current_line[0] in segment_split or current_line[1][0].lower() == 'e'):
return True
return False
def create_cliped_file(fp):
segment = ''
data_set, data_char, now_segment = [], [], []
with open(fp, 'r', encoding='utf-8') as file:
for line in file:
line_split = line.strip().split()
if len(line_split) <= 0:
continue
if need_clip(now_segment=now_segment, current_line=line_split):
if len(now_segment) <= args.max_seq_len:
data_set.append(segment)
data_char.append(now_segment)
now_segment = []
segment = ''
else:
now_segment.append(line_split[0])
segment += line
with open(f'{fp}_clip_{args.clip_size}', 'w', encoding='utf-8') as file:
for seg in data_set:
file.write(seg + '\n\n')
res = np.array([len(seq) for seq in data_char])
overflow = np.count_nonzero([seq_len > args.clip_size for seq_len in res])
max_seq_len = np.max(res)
print(f'Max-Seq-Len in this dataset is: {max_seq_len}')
print(f'There are {overflow} sentences that are more than the clip-size.')
if args.clip_msra:
create_cliped_file(os.path.join(msra_ner_cn_path, args.test_set))
create_cliped_file(os.path.join(msra_ner_cn_path, args.train_set))
老哥,感谢!我用了你的代码,现在在msra数据集的句子长度截断了,但是训练时依然会报OOM,不知道你有没有遇到?
train:7026
train max_seq_len:387
train max_lex_num:233
train max_seq_lex:601
test max_seq_len:351
test max_lex_num:231
test max_seq_lex:572
loading vocabulary file /home/llq/.fastNLP/embedding/bert-chinese-wwm/vocab.txt
Load pre-trained BERT parameters from file /home/llq/.fastNLP/embedding/bert-chinese-wwm/chinese_wwm_pytorch.bin.
Start to generate word pieces for word.
Found(Or segment into word pieces) 116198 words out of 116559.
training epochs started 2020-12-15-13-55-24
Traceback (most recent call last):
File "flat_main.py", line 588, in <module>
trainer.train()
File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/fastNLP/core/trainer.py", line 613, in train
self.callback_manager.on_exception(e)
File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/fastNLP/core/callback.py", line 309, in wrapper
returns.append(getattr(callback, func.__name__)(*arg))
File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/fastNLP/core/callback.py", line 505, in on_exception
raise exception # 抛出陌生Error
File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/fastNLP/core/trainer.py", line 609, in train
self._train()
File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/fastNLP/core/trainer.py", line 697, in _train
eval_res = self._do_validation(epoch=epoch, step=self.step)
File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/fastNLP/core/trainer.py", line 714, in _do_validation
res = self.tester.test()
File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/fastNLP/core/tester.py", line 165, in test
pred_dict = self._data_forward(self._predict_func, batch_x)
File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/fastNLP/core/tester.py", line 213, in _data_forward
y = self._predict_func_wrapper(**x)
File "/home/llq/homework/Flat-Lattice-Transformer/models.py", line 511, in forward
embedding, seq_len, lex_num=lex_num, pos_s=pos_s, pos_e=pos_e)
File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/llq/homework/Flat-Lattice-Transformer/modules.py", line 1277, in forward
rel_pos_embedding = self.four_pos_fusion_embedding(pos_s,pos_e)
File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/llq/homework/Flat-Lattice-Transformer/modules.py", line 110, in forward
pe_2 = torch.cat([pe_ss,pe_ee],dim=-1)
RuntimeError: CUDA out of memory. Tried to allocate 3.43 GiB (GPU 0; 10.91 GiB total capacity; 7.58 GiB already allocated; 2.49 GiB free; 7.87 GiB reserved in total by PyTorch)
你把batch size调小试试, 16尝试一下
我设置了batch为2,梯度积累5步,可是中间还是炸了。。。
估计是测试的test_batch太大了(设置是10),debug发现是在测试爆的
老哥,感谢!我用了你的代码,现在在msra数据集的句子长度截断了,但是训练时依然会报OOM,不知道你有没有遇到?
train:7026 train max_seq_len:387 train max_lex_num:233 train max_seq_lex:601 test max_seq_len:351 test max_lex_num:231 test max_seq_lex:572 loading vocabulary file /home/llq/.fastNLP/embedding/bert-chinese-wwm/vocab.txt Load pre-trained BERT parameters from file /home/llq/.fastNLP/embedding/bert-chinese-wwm/chinese_wwm_pytorch.bin. Start to generate word pieces for word. Found(Or segment into word pieces) 116198 words out of 116559. training epochs started 2020-12-15-13-55-24 Traceback (most recent call last): File "flat_main.py", line 588, in <module> trainer.train() File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/fastNLP/core/trainer.py", line 613, in train self.callback_manager.on_exception(e) File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/fastNLP/core/callback.py", line 309, in wrapper returns.append(getattr(callback, func.__name__)(*arg)) File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/fastNLP/core/callback.py", line 505, in on_exception raise exception # 抛出陌生Error File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/fastNLP/core/trainer.py", line 609, in train self._train() File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/fastNLP/core/trainer.py", line 697, in _train eval_res = self._do_validation(epoch=epoch, step=self.step) File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/fastNLP/core/trainer.py", line 714, in _do_validation res = self.tester.test() File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/fastNLP/core/tester.py", line 165, in test pred_dict = self._data_forward(self._predict_func, batch_x) File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/fastNLP/core/tester.py", line 213, in _data_forward y = self._predict_func_wrapper(**x) File "/home/llq/homework/Flat-Lattice-Transformer/models.py", line 511, in forward embedding, seq_len, lex_num=lex_num, pos_s=pos_s, pos_e=pos_e) File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/home/llq/homework/Flat-Lattice-Transformer/modules.py", line 1277, in forward rel_pos_embedding = self.four_pos_fusion_embedding(pos_s,pos_e) File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/home/llq/homework/Flat-Lattice-Transformer/modules.py", line 110, in forward pe_2 = torch.cat([pe_ss,pe_ee],dim=-1) RuntimeError: CUDA out of memory. Tried to allocate 3.43 GiB (GPU 0; 10.91 GiB total capacity; 7.58 GiB already allocated; 2.49 GiB free; 7.87 GiB reserved in total by PyTorch)
啊 你是不是没有清理cache啊。我看你这边的错误信息,如果你用的我是我的代码处理过的句子的话 不会出现长度超过300的样本哦。超过300的我这边是skip了
老哥,感谢!我用了你的代码,现在在msra数据集的句子长度截断了,但是训练时依然会报OOM,不知道你有没有遇到?
train:7026 train max_seq_len:387 train max_lex_num:233 train max_seq_lex:601 test max_seq_len:351 test max_lex_num:231 test max_seq_lex:572 loading vocabulary file /home/llq/.fastNLP/embedding/bert-chinese-wwm/vocab.txt Load pre-trained BERT parameters from file /home/llq/.fastNLP/embedding/bert-chinese-wwm/chinese_wwm_pytorch.bin. Start to generate word pieces for word. Found(Or segment into word pieces) 116198 words out of 116559. training epochs started 2020-12-15-13-55-24 Traceback (most recent call last): File "flat_main.py", line 588, in <module> trainer.train() File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/fastNLP/core/trainer.py", line 613, in train self.callback_manager.on_exception(e) File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/fastNLP/core/callback.py", line 309, in wrapper returns.append(getattr(callback, func.__name__)(*arg)) File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/fastNLP/core/callback.py", line 505, in on_exception raise exception # 抛出陌生Error File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/fastNLP/core/trainer.py", line 609, in train self._train() File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/fastNLP/core/trainer.py", line 697, in _train eval_res = self._do_validation(epoch=epoch, step=self.step) File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/fastNLP/core/trainer.py", line 714, in _do_validation res = self.tester.test() File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/fastNLP/core/tester.py", line 165, in test pred_dict = self._data_forward(self._predict_func, batch_x) File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/fastNLP/core/tester.py", line 213, in _data_forward y = self._predict_func_wrapper(**x) File "/home/llq/homework/Flat-Lattice-Transformer/models.py", line 511, in forward embedding, seq_len, lex_num=lex_num, pos_s=pos_s, pos_e=pos_e) File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/home/llq/homework/Flat-Lattice-Transformer/modules.py", line 1277, in forward rel_pos_embedding = self.four_pos_fusion_embedding(pos_s,pos_e) File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/home/llq/homework/Flat-Lattice-Transformer/modules.py", line 110, in forward pe_2 = torch.cat([pe_ss,pe_ee],dim=-1) RuntimeError: CUDA out of memory. Tried to allocate 3.43 GiB (GPU 0; 10.91 GiB total capacity; 7.58 GiB already allocated; 2.49 GiB free; 7.87 GiB reserved in total by PyTorch)
啊 你是不是没有清理cache啊。我看你这边的错误信息,如果你用的我是我的代码处理过的句子的话 不会出现长度超过300的样本哦。超过300的我这边是skip了
我修改了一下长度,cache清理过了,目前应该是test时候batch太大导致出错,总之谢谢老哥的代码😄
用了你的代码解决了OOM的问题,感谢老哥!
您好,请问这个cache说的是缓存的数据集吗(我理解不需要清理呀,它存在磁盘,运行的时候才会去加载)还是在训练中需要什么操作可以防止爆显存呢~谢谢。