mindcv icon indicating copy to clipboard operation
mindcv copied to clipboard

RuntimeError: Run task for graph:kernel_graph_37 error

Open YUANMU227 opened this issue 2 years ago • 0 comments

https://github.com/mindspore-lab/mindcv/tree/main/configs/convit 使用:python train.py --config configs/convit/convit_tiny_ascend.yaml --data_dir /home/ma-user/work/data --distribute False

报错: RuntimeError: Run task for graph:kernel_graph_37 error! The details refer to 'Ascend Error Message'.

环境: mindspore2.0.0

log: [WARNING] MD(17532,fffd117fa1e0,python):2023-03-09-20:18:49.009.973 [mindspore/ccsrc/minddata/dataset/engine/datasetops/data_queue_op.cc:832] DetectPerBatchTime] Bad performance attention, it takes more than 25 seconds to fetch a batch of data from dataset pipeline, which might result GetNext timeout problem. You may test dataset processing performance(with creating dataset iterator) and optimize it. [WARNING] DEVICE(17532,fffe74f491e0,python):2023-03-09-20:18:50.318.329 [mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_kernel_runtime.cc:727] GetDumpPath] The environment variable 'MS_OM_PATH' is not set, the files of node dump will save to the process local path, as ./rank_id/node_dump/... [ERROR] DEVICE(17532,fffe74f491e0,python):2023-03-09-20:18:50.318.418 [mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_kernel_runtime.cc:745] DumpTaskExceptionInfo] Task fail infos task_id: 2, stream_id: 1060, tid: 17532, device_id: 0, retcode: 507011 ( model execute failed) [ERROR] DEVICE(17532,fffe74f491e0,python):2023-03-09-20:18:50.318.674 [mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_kernel_runtime.cc:754] DumpTaskExceptionInfo] Dump node (Default/network-TrainOneStepCell/optimizer-AdamW/learning_rate-_IteratorLearningRate/GatherV2-op14983) task error input/output data to: ./rank_0/node_dump The function call stack: In file /home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/nn/optim/optimizer.py:966/ return self.gather(self.learning_rate, global_step, 0)/

[WARNING] MD(17532,fffd117fa1e0,python):2023-03-09-20:18:50.791.545 [mindspore/ccsrc/minddata/dataset/engine/datasetops/data_queue_op.cc:257] SendDataToAscend] Thread has already been terminated. Traceback (most recent call last): File "train.py", line 308, in train(args) File "train.py", line 294, in train trainer.train(args.epoch_size, loader_train, callbacks=callbacks, dataset_sink_mode=args.dataset_sink_mode) File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/train/model.py", line 1054, in train initial_epoch=initial_epoch) File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/train/model.py", line 98, in wrapper func(self, *args, **kwargs) File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/train/model.py", line 616, in _train cb_params, sink_size, initial_epoch, valid_infos) File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/train/model.py", line 703, in _train_dataset_sink_process list_callback.on_train_step_end(run_context) File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/train/callback/_callback.py", line 381, in on_train_step_end cb.on_train_step_end(run_context) File "/home/ma-user/work/mindcv-main/mindcv/engine/callbacks.py", line 151, in on_train_step_end cur_lr = optimizer.learning_rate(step - 1)[0].asnumpy() File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/nn/cell.py", line 631, in call out = self.compile_and_run(*args) File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/nn/cell.py", line 954, in compile_and_run return _cell_graph_executor(self, *new_inputs, phase=self.phase) File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/common/api.py", line 1438, in call return self.run(obj, *args, phase=phase) File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/common/api.py", line 1475, in run return self._exec_pip(obj, *args, phase=phase_real) File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/common/api.py", line 101, in wrapper results = fn(*arg, **kwargs) File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/common/api.py", line 1457, in _exec_pip return self._graph_executor(args, phase) RuntimeError: Run task for graph:kernel_graph_37 error! The details refer to 'Ascend Error Message'.


  • C++ Call Stack: (For framework developers)

mindspore/ccsrc/plugin/device/ascend/hal/hardware/ascend_graph_executor.cc:239 RunGraph

YUANMU227 avatar Mar 09 '23 12:03 YUANMU227