PaddleX
PaddleX copied to clipboard
训练4天然后突然报错了
Checklist:
- 查找历史相关issue寻求解答
- 翻阅FAQ常见问题汇总和答疑
- 确认bug是否在新版本里还未修复
- 翻阅PaddleX 使用文档
描述问题
2023-02-12 11:33:49 [INFO] [TRAIN] Epoch=44/300, Step=1365/2425, loss_xy=4.820615, loss_wh=5.574079, loss_iou=19.460461, loss_iou_aware=4.270789, loss_obj=34.582016, loss_cls=7.113638, loss=75.821602, lr=0.000417, time_each_step=2.14s, eta=369:17:28
2023-02-12 11:34:13 [INFO] [TRAIN] Epoch=44/300, Step=1375/2425, loss_xy=2.955108, loss_wh=3.437961, loss_iou=12.110320, loss_iou_aware=2.348799, loss_obj=20.727703, loss_cls=3.568110, loss=45.148003, lr=0.000417, time_each_step=2.38s, eta=411:55:13
2023-02-12 11:34:36 [INFO] [TRAIN] Epoch=44/300, Step=1385/2425, loss_xy=4.763587, loss_wh=8.038626, loss_iou=21.165606, loss_iou_aware=3.268857, loss_obj=26.561720, loss_cls=4.742907, loss=68.541298, lr=0.000417, time_each_step=2.29s, eta=395:54:13
Exception in thread Thread-45:
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.8/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 583, in _get_data
data = self._data_queue.get(timeout=self._timeout)
File "/root/miniconda3/lib/python3.8/multiprocessing/queues.py", line 108, in get
raise Empty
_queue.Empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/root/miniconda3/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 505, in _thread_loop
batch = self._get_data()
File "/root/miniconda3/lib/python3.8/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 598, in _get_data
raise RuntimeError("DataLoader {} workers exit unexpectedly, " \
RuntimeError: DataLoader 1 workers exit unexpectedly, pids: 145393
Traceback (most recent call last):
File "demo.py", line 46, in <module>
model.train(
File "/root/miniconda3/lib/python3.8/site-packages/paddlex/cv/models/detector.py", line 323, in train
self.train_loop(
File "/root/miniconda3/lib/python3.8/site-packages/paddlex/cv/models/base.py", line 333, in train_loop
for step, data in enumerate(self.train_data_loader()):
File "/root/miniconda3/lib/python3.8/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 697, in __next__
data = self._reader.read_next_var_list()
SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.
[Hint: Expected killed_ != true, but received killed_:1 == true:1.] (at /paddle/paddle/fluid/operators/reader/blocking_queue.h:166)
复现
-
您是否已经正常运行我们提供的教程?
-
您是否在教程的基础上修改代码内容?还请您提供运行的代码
-
您使用的数据集是?
-
请提供您出现的报错信息及相关log
环境
-
请提供您使用的PaddlePaddle和PaddleX的版本号
-
请提供您使用的操作系统信息,如Linux/Windows/MacOS
-
请问您使用的Python版本是?
-
请问您使用的CUDA/cuDNN的版本号是?
看错误是读取样本数据失败了,确认下出错时文件或者系统(如磁盘空间)是否有异常