PaddleNLP icon indicating copy to clipboard operation
PaddleNLP copied to clipboard

使用Doccano标注训练好的数据,然后在训练的过程中被自动kill掉

Open PKQ1688 opened this issue 3 years ago • 3 comments
trafficstars

log如下: $ sh DocumentReview/UIETool/run.sh [2022-08-02 14:28:41,079] [ INFO] - Downloading resource files... INFO 2022-08-02 14:28:41,079 download.py:117] unique_endpoints {''} INFO 2022-08-02 14:28:41,079 download.py:154] Found uie-base/vocab.txt INFO 2022-08-02 14:28:41,080 download.py:117] unique_endpoints {''} INFO 2022-08-02 14:28:41,081 download.py:154] Found uie-base/special_tokens_map.json INFO 2022-08-02 14:28:41,081 download.py:117] unique_endpoints {''} INFO 2022-08-02 14:28:41,081 download.py:154] Found uie-base/tokenizer_config.json [2022-08-02 14:28:41,081] [ INFO] - We are using <class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'> to load 'uie-base'. W0802 14:28:41.101258 12087 gpu_context.cc:278] Please NOTE: device: 0, GPU Compute Capability: 8.0, Driver API Version: 11.6, Runtime API Version: 11.2 W0802 14:28:41.121047 12087 gpu_context.cc:306] device: 0, cuDNN Version: 8.1. DocumentReview/UIETool/run.sh: line 17: 12087 Killed python DocumentReview/UIETool/finetune.py --train_path "data/doccano_data/fangwuzulin/train.txt" --dev_path "data/doccano_data/fangwuzulin/dev.txt" --save_dir "model/uie_model/new/fangwuzulin/" --learning_rate 1e-5 --batch_size 16 --max_seq_len 512 --num_epochs 100 --model "uie-base" --seed 1000 --logging_steps 10 --valid_steps 500 --device "gpu"

PKQ1688 avatar Aug 02 '22 07:08 PKQ1688

您好,使用默认的例子训练的话会有这个问题么 https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/uie#%E8%AE%AD%E7%BB%83%E5%AE%9A%E5%88%B6

linjieccc avatar Aug 02 '22 10:08 linjieccc

默认例子是指使用默认数据集的意思吗?目前我这边是load_dataset报的错。 train_ds = load_dataset(reader, data_path="data/doccano_data/train.txt", max_seq_len=512, lazy=False) 有没有办法在加载数据集的时候遇到有问题的数据集就跳过,类似于try的功能。

PKQ1688 avatar Aug 03 '22 01:08 PKQ1688

可以在数据读取这里加上try..except捕获下异常数据 https://github.com/PaddlePaddle/PaddleNLP/blob/develop/model_zoo/uie/utils.py#L182

linjieccc avatar Aug 03 '22 06:08 linjieccc

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

github-actions[bot] avatar Dec 08 '22 02:12 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。

github-actions[bot] avatar Dec 22 '22 16:12 github-actions[bot]