PaddleNLP
PaddleNLP copied to clipboard
使用Doccano标注训练好的数据,然后在训练的过程中被自动kill掉
log如下: $ sh DocumentReview/UIETool/run.sh [2022-08-02 14:28:41,079] [ INFO] - Downloading resource files... INFO 2022-08-02 14:28:41,079 download.py:117] unique_endpoints {''} INFO 2022-08-02 14:28:41,079 download.py:154] Found uie-base/vocab.txt INFO 2022-08-02 14:28:41,080 download.py:117] unique_endpoints {''} INFO 2022-08-02 14:28:41,081 download.py:154] Found uie-base/special_tokens_map.json INFO 2022-08-02 14:28:41,081 download.py:117] unique_endpoints {''} INFO 2022-08-02 14:28:41,081 download.py:154] Found uie-base/tokenizer_config.json [2022-08-02 14:28:41,081] [ INFO] - We are using <class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'> to load 'uie-base'. W0802 14:28:41.101258 12087 gpu_context.cc:278] Please NOTE: device: 0, GPU Compute Capability: 8.0, Driver API Version: 11.6, Runtime API Version: 11.2 W0802 14:28:41.121047 12087 gpu_context.cc:306] device: 0, cuDNN Version: 8.1. DocumentReview/UIETool/run.sh: line 17: 12087 Killed python DocumentReview/UIETool/finetune.py --train_path "data/doccano_data/fangwuzulin/train.txt" --dev_path "data/doccano_data/fangwuzulin/dev.txt" --save_dir "model/uie_model/new/fangwuzulin/" --learning_rate 1e-5 --batch_size 16 --max_seq_len 512 --num_epochs 100 --model "uie-base" --seed 1000 --logging_steps 10 --valid_steps 500 --device "gpu"
您好,使用默认的例子训练的话会有这个问题么 https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/uie#%E8%AE%AD%E7%BB%83%E5%AE%9A%E5%88%B6
默认例子是指使用默认数据集的意思吗?目前我这边是load_dataset报的错。 train_ds = load_dataset(reader, data_path="data/doccano_data/train.txt", max_seq_len=512, lazy=False) 有没有办法在加载数据集的时候遇到有问题的数据集就跳过,类似于try的功能。
可以在数据读取这里加上try..except捕获下异常数据 https://github.com/PaddlePaddle/PaddleNLP/blob/develop/model_zoo/uie/utils.py#L182
This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。
This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。