mindcv
mindcv copied to clipboard
RuntimeError: Run task for graph:kernel_graph_37 error
https://github.com/mindspore-lab/mindcv/tree/main/configs/convit 使用:python train.py --config configs/convit/convit_tiny_ascend.yaml --data_dir /home/ma-user/work/data --distribute False
报错: RuntimeError: Run task for graph:kernel_graph_37 error! The details refer to 'Ascend Error Message'.
环境: mindspore2.0.0
log:
[WARNING] MD(17532,fffd117fa1e0,python):2023-03-09-20:18:49.009.973 [mindspore/ccsrc/minddata/dataset/engine/datasetops/data_queue_op.cc:832] DetectPerBatchTime] Bad performance attention, it takes more than 25 seconds to fetch a batch of data from dataset pipeline, which might result GetNext timeout problem. You may test dataset processing performance(with creating dataset iterator) and optimize it.
[WARNING] DEVICE(17532,fffe74f491e0,python):2023-03-09-20:18:50.318.329 [mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_kernel_runtime.cc:727] GetDumpPath] The environment variable 'MS_OM_PATH' is not set, the files of node dump will save to the process local path, as ./rank_id/node_dump/...
[ERROR] DEVICE(17532,fffe74f491e0,python):2023-03-09-20:18:50.318.418 [mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_kernel_runtime.cc:745] DumpTaskExceptionInfo] Task fail infos task_id: 2, stream_id: 1060, tid: 17532, device_id: 0, retcode: 507011 ( model execute failed)
[ERROR] DEVICE(17532,fffe74f491e0,python):2023-03-09-20:18:50.318.674 [mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_kernel_runtime.cc:754] DumpTaskExceptionInfo] Dump node (Default/network-TrainOneStepCell/optimizer-AdamW/learning_rate-_IteratorLearningRate/GatherV2-op14983) task error input/output data to: ./rank_0/node_dump
The function call stack:
In file /home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/nn/optim/optimizer.py:966/ return self.gather(self.learning_rate, global_step, 0)/
[WARNING] MD(17532,fffd117fa1e0,python):2023-03-09-20:18:50.791.545 [mindspore/ccsrc/minddata/dataset/engine/datasetops/data_queue_op.cc:257] SendDataToAscend] Thread has already been terminated.
Traceback (most recent call last):
File "train.py", line 308, in
- C++ Call Stack: (For framework developers)
mindspore/ccsrc/plugin/device/ascend/hal/hardware/ascend_graph_executor.cc:239 RunGraph