PaddleNLP
PaddleNLP copied to clipboard
使用UIE时不稳定coredump,且没有任何提示
问题描述
/ssd1/liuyaping/env/lib/python3.7/site-packages/pandas/util/testing.py:20: DeprecationWarning: np.bool
is a deprecated alias for the builtin bool
. To silence this warning, use bool
by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.bool_
here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
from pandas._libs import testing as _testing
/ssd1/liuyaping/env/lib/python3.7/site-packages/scipy/sparse/sputils.py:16: DeprecationWarning: np.typeDict
is a deprecated alias for np.sctypeDict
.
supported_dtypes = [np.typeDict[x] for x in supported_dtypes]
/ssd1/liuyaping/env/lib/python3.7/site-packages/scipy/special/orthogonal.py:81: DeprecationWarning: np.int
is a deprecated alias for the builtin int
. To silence this warning, use int
by itself. Doing this will not modify any behavior and is safe. When replacing np.int
, you may wish to use e.g. np.int64
or np.int32
to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
from numpy import (exp, inf, pi, sqrt, floor, sin, cos, around, int,
/ssd1/liuyaping/env/lib/python3.7/site-packages/scipy/linalg/init.py:217: DeprecationWarning: The module numpy.dual is deprecated. Instead of using dual, use the functions directly from numpy or scipy.
from numpy.dual import register_func
[2022-09-19 18:06:42,804] [ INFO] - Downloading resource files...
[2022-09-19 18:06:42,808] [ INFO] - We are using <class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'> to load 'uie-base'.
W0919 18:06:42.849376 338382 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 10.2, Runtime API Version: 10.2
W0919 18:06:42.856266 338382 gpu_resources.cc:91] device: 0, cuDNN Version: 7.0.
train.sh: line 33: 338382 Segmentation fault /ssd1/liuyaping/env/bin/python3.7 finetune.py --train_path ./data/train.txt --dev_path ./data/dev.txt --save_dir ./checkpoint --learning_rate 1e-5 --batch_size 16 --max_seq_len 256 --num_epochs 300 --model uie-base --seed 100 --logging_steps 10 --valid_steps 100
可以尝试看看显存是否爆了,上面的信息不是特别多,不太好排查问题,或者你把数据发给我们测试一下
爆内存了,数据预处理的时候有内存泄露。
train_ds = load_dataset(reader, data_path=args.train_path, max_seq_len=args.max_seq_len, lazy=False) dev_ds = load_dataset(reader, data_path=args.dev_path, max_seq_len=args.max_seq_len, lazy=False)
这段代码中的lazy建议开启为true。可能会一定程度上环节coredump的情况。
试了一下不能解决,目前看来只能将数据量降低来解决这个问题了。
train_ds = load_dataset(reader, data_path=args.train_path, max_seq_len=args.max_seq_len, lazy=False) dev_ds = load_dataset(reader, data_path=args.dev_path, max_seq_len=args.max_seq_len, lazy=True)
出现以下错误:
File "/ssd1/lyp/python38/lib/python3.8/site-packages/paddle/fluid/dataloader/batch_sampler.py", line 116, in init
assert not isinstance(dataset, IterableDataset),
AssertionError: dataset should not be a paddle.io.IterableDataset
是啊是啊 连环报错 我现在也在改 可真是艰难 我看起来是因为负样本构造比例问题 但是调节了之后依旧报错
这是一个bug 我还没改明白
train_ds = load_dataset(reader, data_path=args.train_path, max_seq_len=args.max_seq_len, lazy=False) dev_ds = load_dataset(reader, data_path=args.dev_path, max_seq_len=args.max_seq_len, lazy=True) 出现以下错误: File "/ssd1/lyp/python38/lib/python3.8/site-packages/paddle/fluid/dataloader/batch_sampler.py", line 116, in init assert not isinstance(dataset, IterableDataset), AssertionError: dataset should not be a paddle.io.IterableDataset
这个是因为多卡的Sampler是不支持IterableDataset,这个是DataLoader导致的,你可以改成单卡训练 ,同时把Sampler改成普通的BatchSampler
https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/io/BatchSampler_cn.html#batchsampler

我是单卡训练
This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。
This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。