PaddleNLP icon indicating copy to clipboard operation
PaddleNLP copied to clipboard

使用UIE时不稳定coredump,且没有任何提示

Open hnlslyp opened this issue 2 years ago • 9 comments

问题描述

/ssd1/liuyaping/env/lib/python3.7/site-packages/pandas/util/testing.py:20: DeprecationWarning: np.bool is a deprecated alias for the builtin bool. To silence this warning, use bool by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.bool_ here. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations from pandas._libs import testing as _testing /ssd1/liuyaping/env/lib/python3.7/site-packages/scipy/sparse/sputils.py:16: DeprecationWarning: np.typeDict is a deprecated alias for np.sctypeDict. supported_dtypes = [np.typeDict[x] for x in supported_dtypes] /ssd1/liuyaping/env/lib/python3.7/site-packages/scipy/special/orthogonal.py:81: DeprecationWarning: np.int is a deprecated alias for the builtin int. To silence this warning, use int by itself. Doing this will not modify any behavior and is safe. When replacing np.int, you may wish to use e.g. np.int64 or np.int32 to specify the precision. If you wish to review your current use, check the release note link for additional information. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations from numpy import (exp, inf, pi, sqrt, floor, sin, cos, around, int, /ssd1/liuyaping/env/lib/python3.7/site-packages/scipy/linalg/init.py:217: DeprecationWarning: The module numpy.dual is deprecated. Instead of using dual, use the functions directly from numpy or scipy. from numpy.dual import register_func [2022-09-19 18:06:42,804] [ INFO] - Downloading resource files... [2022-09-19 18:06:42,808] [ INFO] - We are using <class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'> to load 'uie-base'. W0919 18:06:42.849376 338382 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 10.2, Runtime API Version: 10.2 W0919 18:06:42.856266 338382 gpu_resources.cc:91] device: 0, cuDNN Version: 7.0. train.sh: line 33: 338382 Segmentation fault /ssd1/liuyaping/env/bin/python3.7 finetune.py --train_path ./data/train.txt --dev_path ./data/dev.txt --save_dir ./checkpoint --learning_rate 1e-5 --batch_size 16 --max_seq_len 256 --num_epochs 300 --model uie-base --seed 100 --logging_steps 10 --valid_steps 100

hnlslyp avatar Sep 19 '22 10:09 hnlslyp

可以尝试看看显存是否爆了,上面的信息不是特别多,不太好排查问题,或者你把数据发给我们测试一下

wawltor avatar Sep 19 '22 14:09 wawltor

爆内存了,数据预处理的时候有内存泄露。

natureLanguageQing avatar Sep 20 '22 06:09 natureLanguageQing

train_ds = load_dataset(reader, data_path=args.train_path, max_seq_len=args.max_seq_len, lazy=False) dev_ds = load_dataset(reader, data_path=args.dev_path, max_seq_len=args.max_seq_len, lazy=False) 这段代码中的lazy建议开启为true。可能会一定程度上环节coredump的情况。

natureLanguageQing avatar Sep 20 '22 07:09 natureLanguageQing

试了一下不能解决,目前看来只能将数据量降低来解决这个问题了。

natureLanguageQing avatar Sep 20 '22 07:09 natureLanguageQing

train_ds = load_dataset(reader, data_path=args.train_path, max_seq_len=args.max_seq_len, lazy=False) dev_ds = load_dataset(reader, data_path=args.dev_path, max_seq_len=args.max_seq_len, lazy=True) 出现以下错误: File "/ssd1/lyp/python38/lib/python3.8/site-packages/paddle/fluid/dataloader/batch_sampler.py", line 116, in init assert not isinstance(dataset, IterableDataset),
AssertionError: dataset should not be a paddle.io.IterableDataset

hnlslyp avatar Sep 21 '22 05:09 hnlslyp

是啊是啊 连环报错 我现在也在改 可真是艰难 我看起来是因为负样本构造比例问题 但是调节了之后依旧报错

natureLanguageQing avatar Sep 22 '22 01:09 natureLanguageQing

这是一个bug 我还没改明白

natureLanguageQing avatar Sep 22 '22 05:09 natureLanguageQing

train_ds = load_dataset(reader, data_path=args.train_path, max_seq_len=args.max_seq_len, lazy=False) dev_ds = load_dataset(reader, data_path=args.dev_path, max_seq_len=args.max_seq_len, lazy=True) 出现以下错误: File "/ssd1/lyp/python38/lib/python3.8/site-packages/paddle/fluid/dataloader/batch_sampler.py", line 116, in init assert not isinstance(dataset, IterableDataset), AssertionError: dataset should not be a paddle.io.IterableDataset

这个是因为多卡的Sampler是不支持IterableDataset,这个是DataLoader导致的,你可以改成单卡训练 ,同时把Sampler改成普通的BatchSampler

https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/io/BatchSampler_cn.html#batchsampler

image

wawltor avatar Sep 22 '22 06:09 wawltor

我是单卡训练

hnlslyp avatar Sep 27 '22 03:09 hnlslyp

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

github-actions[bot] avatar Dec 07 '22 07:12 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。

github-actions[bot] avatar Dec 21 '22 11:12 github-actions[bot]