mmdetection icon indicating copy to clipboard operation
mmdetection copied to clipboard

jupyter notebook Kernel crashed

Open blakeliu opened this issue 2 years ago • 1 comments

Thanks for your error report and we appreciate it a lot.

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. I have read the FAQ documentation but cannot get the expected help.
  3. The bug has not been fixed in the latest version.

Describe the bug Run demo/MMDet_InstanceSeg_Tutorial.ipynb, Start train a new detector, Kernel crashed

Reproduction vscode open demo/MMDet_InstanceSeg_Tutorial.ipynb, change kernel

  1. Did you make any modifications on the code or config? Did you understand what you have modified? change code
# We can set the evaluation interval to reduce the evaluation times
cfg.evaluation.interval = 1
  1. What dataset did you use?

Environment

  1. Please run python mmdet/utils/collect_env.py to collect necessary environment information and paste it here. {'sys.platform': 'linux', 'Python': '3.8.13 (default, Mar 28 2022, 11:38:47) [GCC 7.5.0]', 'CUDA available': True, 'GPU 0': 'NVIDIA GeForce RTX 3090', 'GPU 1': 'NVIDIA GeForce GTX 1080 Ti', 'CUDA_HOME': None, 'GCC': 'gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0', 'PyTorch': '1.8.2+cu111', 'PyTorch compiling details': 'PyTorch built with:\n - GCC 7.3\n - C++ Version: 201402\n - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - NNPACK is enabled\n - CPU capability usage: AVX2\n - CUDA Runtime 11.1\n - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86\n - CuDNN 8.0.5\n - Magma 2.5.2\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, \n', 'TorchVision': '0.9.2+cu111', 'OpenCV': '4.6.0', 'MMCV': '1.6.0', 'MMCV Compiler': 'GCC 7.3', 'MMCV CUDA Compiler': '11.1', 'MMDetection': '2.25.0+b64386b'}
  2. You may add addition that may be helpful for locating the problem, such as
    • How you installed PyTorch [e.g., pip, conda, source]
   pip install torch==1.8.2 torchvision==0.9.2 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu111  
   pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.8.0/index.html
  • Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback If applicable, paste the error trackback here.

loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
Output exceeds the [size limit](command:workbench.action.openSettings?%5B%22notebook.output.textLineLimit%22%5D). Open the full output data[ in a text editor](command:workbench.action.openLargeOutput?baed04c6-a70a-4110-9f7d-c9e326ebae5a)
2022-08-05 16:34:36,421 - mmdet - INFO - Automatic scaling of learning rate (LR) has been disabled.
2022-08-05 16:34:36,431 - mmdet - INFO - load checkpoint from local path: checkpoints/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco_bbox_mAP-0.408__segm_mAP-0.37_20200504_163245-42aa3d00.pth
2022-08-05 16:34:36,544 - mmdet - WARNING - The model and loaded state dict do not match exactly

size mismatch for roi_head.bbox_head.fc_cls.weight: copying a param with shape torch.Size([81, 1024]) from checkpoint, the shape in current model is torch.Size([2, 1024]).
size mismatch for roi_head.bbox_head.fc_cls.bias: copying a param with shape torch.Size([81]) from checkpoint, the shape in current model is torch.Size([2]).
size mismatch for roi_head.bbox_head.fc_reg.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([4, 1024]).
size mismatch for roi_head.bbox_head.fc_reg.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([4]).
size mismatch for roi_head.mask_head.conv_logits.weight: copying a param with shape torch.Size([80, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 256, 1, 1]).
size mismatch for roi_head.mask_head.conv_logits.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([1]).
2022-08-05 16:34:36,547 - mmdet - INFO - Start running, host: tf@tf-MW51-HP0-00, work_dir: /home/tf/PycharmProjects/det/mmdetection/tutorial_exps
2022-08-05 16:34:36,548 - mmdet - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) StepLrUpdaterHook                  
(NORMAL      ) CheckpointHook                     
(LOW         ) EvalHook                           
(VERY_LOW    ) TextLoggerHook                     
(VERY_LOW    ) TensorboardLoggerHook              
 -------------------- 
before_train_epoch:
(VERY_HIGH   ) StepLrUpdaterHook                  
(NORMAL      ) NumClassCheckHook                  
(LOW         ) IterTimerHook                      
(LOW         ) EvalHook                           
(VERY_LOW    ) TextLoggerHook                     
...
(VERY_LOW    ) TensorboardLoggerHook              
 -------------------- 
2022-08-05 16:34:36,549 - mmdet - INFO - workflow: [('train', 1)], max: 12 epochs
2022-08-05 16:34:36,550 - mmdet - INFO - Checkpoints will be saved to /home/tf/PycharmProjects/det/mmdetection/tutorial_exps by HardDiskBackend.
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
2022-08-05 16:34:40,816 - mmdet - INFO - Epoch [1][10/31]	lr: 2.500e-03, eta: 0:02:27, time: 0.407, data_time: 0.238, memory: 3641, loss_rpn_cls: 0.0279, loss_rpn_bbox: 0.0167, loss_cls: 0.3629, acc: 84.2090, loss_bbox: 0.4120, loss_mask: 0.4681, loss: 1.2876
2022-08-05 16:34:42,572 - mmdet - INFO - Epoch [1][20/31]	lr: 2.500e-03, eta: 0:01:42, time: 0.176, data_time: 0.014, memory: 3641, loss_rpn_cls: 0.0417, loss_rpn_bbox: 0.0142, loss_cls: 0.1402, acc: 95.7910, loss_bbox: 0.3046, loss_mask: 0.1150, loss: 0.6157
2022-08-05 16:34:44,315 - mmdet - INFO - Epoch [1][30/31]	lr: 2.500e-03, eta: 0:01:26, time: 0.175, data_time: 0.015, memory: 3665, loss_rpn_cls: 0.0159, loss_rpn_bbox: 0.0100, loss_cls: 0.0620, acc: 97.6953, loss_bbox: 0.1297, loss_mask: 0.1267, loss: 0.3444
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 13/13, 7.0 task/s, elapsed: 2s, ETA:     0sOutput exceeds the [size limit](command:workbench.action.openSettings?%5B%22notebook.output.textLineLimit%22%5D). Open the full output data[ in a text editor](command:workbench.action.openLargeOutput?f4ee02e3-340d-4bb0-b468-33d325163d91)
2022-08-05 16:34:47,852 - mmdet - INFO - Evaluating bbox...
2022-08-05 16:34:47,896 - mmdet - INFO - 
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.642
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.859
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.799
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.118
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.571
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.692
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.688
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.688
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.688
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.350
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.608
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.733

2022-08-05 16:34:47,897 - mmdet - INFO - Evaluating segm...
/home/tf/PycharmProjects/det/mmdetection/mmdet/datasets/coco.py:470: UserWarning: The key "bbox" is deleted for more accurate mask AP of small/medium/large instances since v2.12.0. This does not change the overall mAP calculation.
  warnings.warn(
/home/tf/miniconda3/lib/python3.8/site-packages/pycocotools/cocoeval.py:378: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  tp_sum = np.cumsum(tps, axis=1).astype(dtype=np.float)
2022-08-05 16:34:47,948 - mmdet - INFO - 
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.761
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.859
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.839
...
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.708
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.864

2022-08-05 16:34:47,950 - mmdet - INFO - Epoch(val) [1][13]	bbox_mAP: 0.6420, bbox_mAP_50: 0.8590, bbox_mAP_75: 0.7990, bbox_mAP_s: 0.1180, bbox_mAP_m: 0.5710, bbox_mAP_l: 0.6920, bbox_mAP_copypaste: 0.642 0.859 0.799 0.118 0.571 0.692, segm_mAP: 0.7610, segm_mAP_50: 0.8590, segm_mAP_75: 0.8390, segm_mAP_s: 0.0400, segm_mAP_m: 0.6620, segm_mAP_l: 0.8280, segm_mAP_copypaste: 0.761 0.859 0.839 0.040 0.662 0.828
Loading and preparing results...
DONE (t=0.00s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.03s).
Accumulating evaluation results...
DONE (t=0.01s).
Loading and preparing results...
DONE (t=0.00s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *segm*
DONE (t=0.03s).
Accumulating evaluation results...
DONE (t=0.02s).
Canceled future for execute_request message before replies were done
The Kernel crashed while executing code in the the current cell or a previous cell. Please review the code in the cell(s) to identify a possible cause of the failure. Click [here](https://aka.ms/vscodeJupyterKernelCrash) for more info. View Jupyter [log](command:jupyter.viewOutput) for further details.

Bug fix If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

blakeliu avatar Aug 05 '22 08:08 blakeliu

Haven't met this error before, maybe you can refer to https://github.com/microsoft/vscode-jupyter/wiki/Kernel-crashes-when-using-numpy

BIGWangYuDong avatar Aug 08 '22 01:08 BIGWangYuDong