TOOD icon indicating copy to clipboard operation
TOOD copied to clipboard

Error during training (Assertion input_val >= zero && input_val <= one failed.)

Open MeteoriteWeny opened this issue 3 years ago • 6 comments

Problem

thank you for contribution, I encountered gradient exploding during training the model tood_r50_fpn_1x_coco.

  • I tried to train this model in Mix-Precision Training strategy, and the loss scale was set 'dynamic'. The training soon stopped, and raise RuntimeError: CUDA error: device-side assert triggered.

  • I also retrained the model with FP32 precision, but it did not work.

  • A lower lr did not address gradient exploding.

  • Gradient cutting helps avoid training failure (Mix-Precision Training, loss scale=512.) , but the model can not converge.

    I try to google this issue. I think it is not OOM. It seems to relate with the NaN value in prediction head and further cause the error at calculating loss. I do not know if the environment(mmdet-1.15.0) affects with training.

My modification

  • I port the TOOD code to my working environment (MMDet-1.15.0), without edit.
  • I edit the training config to train my own dataset.

Environment

2021-12-09 16:50:01,643 - mmdet - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.7.11 (default, Jul 27 2021, 14:32:16) [GCC 7.5.0]
CUDA available: True
GPU 0: NVIDIA GeForce RTX 2070
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.4.r11.4/compiler.30033411_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.9.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.3-Product Build 20210617 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.10.0
OpenCV: 4.5.3
MMCV: 1.3.10
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 11.1
MMDetection: 2.15.0+87eda06
------------------------------------------------------------

Error Report

/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [32,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [33,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [34,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [35,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [36,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [37,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [38,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [39,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [40,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [41,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [42,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [43,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [44,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [45,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [46,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [47,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [48,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [49,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [50,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [51,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [52,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [53,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [54,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [55,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [56,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [57,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [58,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [59,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [60,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [61,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [62,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [63,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [32,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [33,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [34,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [35,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [36,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [37,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [38,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [39,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [40,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [41,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [42,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [43,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [44,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [45,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [46,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [47,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [48,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [49,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [50,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [51,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [52,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [53,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [54,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [55,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [56,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [57,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [58,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [59,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [60,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [61,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [62,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [63,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [0,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [1,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [2,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [3,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [4,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [5,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [6,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [7,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [8,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [9,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [10,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [11,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [12,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [13,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [14,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [15,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [16,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [17,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [18,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [19,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [20,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [21,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [22,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [23,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [24,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [25,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [26,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [27,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [28,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [29,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [30,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [31,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [0,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [1,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [2,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [3,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [4,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [5,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [6,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [7,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [8,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [9,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [10,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [11,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [12,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [13,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [14,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [15,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [16,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [17,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [18,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [19,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [20,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [21,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [22,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [23,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [24,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [25,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [26,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [27,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [28,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [29,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [30,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [31,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [0,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [1,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [2,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [3,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [4,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [5,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [6,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [7,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [8,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [9,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [10,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [11,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [12,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [13,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [14,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [15,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [16,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [17,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [18,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [19,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [20,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [21,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [22,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [23,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [24,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [25,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [26,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [27,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [28,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [29,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [30,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [31,0,0] Assertion `input_val >= zero && input_val <= one` failed.
Traceback (most recent call last):
  File "tools/train.py", line 188, in <module>
    main()
  File "tools/train.py", line 184, in main
    meta=meta)
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmdet-2.15.0-py3.7.egg/mmdet/apis/train.py", line 170, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
    self.run_iter(data_batch, train_mode=True, **kwargs)
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
    **kwargs)
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step
    return self.module.train_step(*inputs[0], **kwargs[0])
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmdet-2.15.0-py3.7.egg/mmdet/models/detectors/base.py", line 237, in train_step
    losses = self(**data)
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 97, in new_func
    return old_func(*args, **kwargs)
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmdet-2.15.0-py3.7.egg/mmdet/models/detectors/base.py", line 171, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmdet-2.15.0-py3.7.egg/mmdet/models/detectors/single_stage.py", line 83, in forward_train
    gt_labels, gt_bboxes_ignore)
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmdet-2.15.0-py3.7.egg/mmdet/models/dense_heads/base_dense_head.py", line 54, in forward_train
    losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 185, in new_func
    return old_func(*args, **kwargs)
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmdet-2.15.0-py3.7.egg/mmdet/models/dense_heads/tood_head.py", line 426, in loss
    num_total_samples=num_total_samples)
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmdet-2.15.0-py3.7.egg/mmdet/core/utils/misc.py", line 29, in multi_apply
    return tuple(map(list, zip(*map_results)))
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmdet-2.15.0-py3.7.egg/mmdet/models/dense_heads/tood_head.py", line 333, in loss_single
    & (labels < bg_class_ind)).nonzero().squeeze(1)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
terminate called after throwing an instance of 'c10::CUDAError'
  what():  CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1623448265233/work/c10/cuda/CUDACachingAllocator.cpp:1055 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f12c21efa22 in /root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x10ac3 (0x7f12c2451ac3 in /root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1a7 (0x7f12c2453167 in /root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x54 (0x7f12c21d95a4 in /root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #4: <unknown function> + 0xa2bb12 (0x7f133bad0b12 in /root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0xa2bbb1 (0x7f133bad0bb1 in /root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #24: __libc_start_main + 0xe7 (0x7f1376d75bf7 in /lib/x86_64-linux-gnu/libc.so.6)

Aborted

MeteoriteWeny avatar Dec 10 '21 03:12 MeteoriteWeny

Same issue

one23sunnyQQ avatar Dec 10 '21 09:12 one23sunnyQQ

Same issue. My mmdet's version is 2.19.0 and raise error during training the 3rd epoch

Bo396543018 avatar Dec 22 '21 14:12 Bo396543018

You can try to clamp the value of the box area when computing GIoU loss, e.g., https://github.com/fcjian/TOOD/blob/93b3a87556e361f7d56507bd56943cf121c3caa2/mmdet/core/bbox/iou_calculators/iou2d_calculator.py#L212-L215

fcjian avatar Dec 25 '21 02:12 fcjian

You can try to clamp the value of the box area when computing GIoU loss, e.g.,

https://github.com/fcjian/TOOD/blob/93b3a87556e361f7d56507bd56943cf121c3caa2/mmdet/core/bbox/iou_calculators/iou2d_calculator.py#L212-L215

hello sir,i have clamp the value of box area as you show ,but still crash at the 5rd epoch. My mmdet's version is 2.14.0+d3e713d.

Error Report:

/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [508,0,0], thread: [26,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [508,0,0], thread: [27,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [508,0,0], thread: [28,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [508,0,0], thread: [29,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [508,0,0], thread: [30,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [508,0,0], thread: [31,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [32,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [33,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [34,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [35,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [36,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [37,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [38,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [39,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [40,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [41,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [42,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [43,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [44,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [45,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [46,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [47,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [48,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [49,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [50,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [51,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [52,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [53,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [54,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [55,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [56,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [57,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [58,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [59,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [60,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [61,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [62,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [63,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [0,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [1,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [2,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [3,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [4,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [5,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [6,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [7,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [8,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [9,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [10,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [11,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [12,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [13,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [14,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [15,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [16,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [17,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [18,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [19,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [20,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [21,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [22,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [23,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [24,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [25,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [26,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [27,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [28,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [29,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [30,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [31,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [32,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [33,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [34,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [35,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [36,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [37,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [38,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [39,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [40,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [41,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [42,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [43,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [44,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [45,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [46,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [47,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [48,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [49,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [50,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [51,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [52,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [53,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [54,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [55,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [56,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [57,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [58,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [59,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [60,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [61,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [62,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [63,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [0,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [1,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [2,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [3,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [4,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [5,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [6,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [7,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [8,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [9,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [10,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [11,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [12,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [13,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [14,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [15,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [16,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [17,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [18,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [19,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [20,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [21,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [22,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [23,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [24,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [25,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [26,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [27,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [28,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [29,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [30,0,0] Assertion input_val >= zero && input_val <= one failed. /opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [31,0,0] Assertion input_val >= zero && input_val <= one failed. Traceback (most recent call last): File "./tools/train.py", line 188, in main() File "./tools/train.py", line 184, in main meta=meta) File "/mnt/mhm/project/TODO/TOOD/mmdet/apis/train.py", line 170, in train_detector runner.run(data_loaders, cfg.workflow) File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], **kwargs) File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train self.run_iter(data_batch, train_mode=True, **kwargs) File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter **kwargs) File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 52, in train_step output = self.module.train_step(*inputs[0], **kwargs[0]) File "/mnt/mhm/project/TODO/TOOD/mmdet/models/detectors/base.py", line 237, in train_step losses = self(**data) File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 98, in new_func return old_func(*args, **kwargs) File "/mnt/mhm/project/TODO/TOOD/mmdet/models/detectors/base.py", line 171, in forward return self.forward_train(img, img_metas, **kwargs) File "/mnt/mhm/project/TODO/TOOD/mmdet/models/detectors/single_stage.py", line 83, in forward_train gt_labels, gt_bboxes_ignore) File "/mnt/mhm/project/TODO/TOOD/mmdet/models/dense_heads/base_dense_head.py", line 54, in forward_train losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore) File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 186, in new_func return old_func(*args, kwargs) File "/mnt/mhm/project/TODO/TOOD/mmdet/models/dense_heads/tood_head.py", line 447, in loss num_total_samples=num_total_samples) File "/mnt/mhm/project/TODO/TOOD/mmdet/core/utils/misc.py", line 29, in multi_apply return tuple(map(list, zip(map_results))) File "/mnt/mhm/project/TODO/TOOD/mmdet/models/dense_heads/tood_head.py", line 354, in loss_single & (labels < bg_class_ind)).nonzero().squeeze(1) RuntimeError: CUDA error: device-side assert triggered terminate called after throwing an instance of 'c10::Error' what(): CUDA error: device-side assert triggered Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1614378098133/work/c10/cuda/CUDACachingAllocator.cpp:733 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fdf062a32f2 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so) frame #1: c10::detail::torchCheckFail(char const, char const, unsigned int, std::string const&) + 0x5b (0x7fdf062a067b in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so) frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void) + 0x809 (0x7fdf064fc219 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so) frame #3: c10::TensorImpl::release_resources() + 0x54 (0x7fdf0628b3a4 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so) frame #4: + 0x6e6a3a (0x7fdf5d204a3a in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame #5: + 0x6e6ae1 (0x7fdf5d204ae1 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame #6: + 0x1817da (0x55b94fc797da in /root/anaconda3/envs/open-mmlab/bin/python) frame #7: + 0xfbfa9 (0x55b94fbf3fa9 in /root/anaconda3/envs/open-mmlab/bin/python) frame #8: + 0xfa8c8 (0x55b94fbf28c8 in /root/anaconda3/envs/open-mmlab/bin/python) frame #9: + 0xfa8c8 (0x55b94fbf28c8 in /root/anaconda3/envs/open-mmlab/bin/python) frame #10: + 0xfa2d8 (0x55b94fbf22d8 in /root/anaconda3/envs/open-mmlab/bin/python) frame #11: + 0xfad68 (0x55b94fbf2d68 in /root/anaconda3/envs/open-mmlab/bin/python) frame #12: + 0xfad7c (0x55b94fbf2d7c in /root/anaconda3/envs/open-mmlab/bin/python) frame #13: + 0xfad7c (0x55b94fbf2d7c in /root/anaconda3/envs/open-mmlab/bin/python) frame #14: + 0xfad7c (0x55b94fbf2d7c in /root/anaconda3/envs/open-mmlab/bin/python) frame #15: + 0xfad7c (0x55b94fbf2d7c in /root/anaconda3/envs/open-mmlab/bin/python) frame #16: + 0xfad7c (0x55b94fbf2d7c in /root/anaconda3/envs/open-mmlab/bin/python) frame #17: + 0xfad7c (0x55b94fbf2d7c in /root/anaconda3/envs/open-mmlab/bin/python) frame #18: + 0x12b327 (0x55b94fc23327 in /root/anaconda3/envs/open-mmlab/bin/python) frame #19: PyDict_SetItemString + 0x89 (0x55b94fc2fe59 in /root/anaconda3/envs/open-mmlab/bin/python) frame #20: PyImport_Cleanup + 0xab (0x55b94fca4d0b in /root/anaconda3/envs/open-mmlab/bin/python) frame #21: Py_FinalizeEx + 0x64 (0x55b94fd19304 in /root/anaconda3/envs/open-mmlab/bin/python) frame #22: + 0x232960 (0x55b94fd2a960 in /root/anaconda3/envs/open-mmlab/bin/python) frame #23: _Py_UnixMain + 0x3c (0x55b94fd2accc in /root/anaconda3/envs/open-mmlab/bin/python) frame #24: __libc_start_main + 0xf0 (0x7fdf9851e830 in /lib/x86_64-linux-gnu/libc.so.6) frame #25: + 0x1d7555 (0x55b94fccf555 in /root/anaconda3/envs/open-mmlab/bin/python)

Killing subprocess 19911 Traceback (most recent call last): File "/root/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/root/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 340, in main() File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/root/anaconda3/envs/open-mmlab/bin/python', '-u', './tools/train.py', '--local_rank=0',

Thank you for your reply.

GloriaHM avatar Dec 29 '21 02:12 GloriaHM

@fcjian Thanks for reply! It solves the CUDA error, but the model can not converge. During training, a problem similar with gradient cutting happened. The log shows a sudden increase of loss. After that, the loss fluctuates in a tiny range. I'll try again with the original TOOD code without transfering to higher mmdet version.

2021-12-29 09:32:52,217 - mmdet - INFO - Epoch [1][600/1162]	lr: 2.000e-03, eta: 8:39:18, time: 0.544, data_time: 0.013, memory: 5142, loss_cls: 0.6940, loss_bbox: 1.2061, loss: 1.9001
2021-12-29 09:33:18,832 - mmdet - INFO - Epoch [1][650/1162]	lr: 2.000e-03, eta: 8:38:09, time: 0.532, data_time: 0.013, memory: 5142, loss_cls: 0.6794, loss_bbox: 1.1886, loss: 1.8680
2021-12-29 09:33:45,535 - mmdet - INFO - Epoch [1][700/1162]	lr: 2.000e-03, eta: 8:37:13, time: 0.534, data_time: 0.013, memory: 5142, loss_cls: 0.6674, loss_bbox: 1.0485, loss: 1.7159
2021-12-29 09:34:12,217 - mmdet - INFO - Epoch [1][750/1162]	lr: 2.000e-03, eta: 8:36:19, time: 0.534, data_time: 0.013, memory: 5142, loss_cls: 0.6646, loss_bbox: 1.0119, loss: 1.6765
2021-12-29 09:34:38,781 - mmdet - INFO - Epoch [1][800/1162]	lr: 2.000e-03, eta: 8:35:20, time: 0.531, data_time: 0.013, memory: 5142, loss_cls: 0.6487, loss_bbox: 0.9564, loss: 1.6051
2021-12-29 09:35:05,190 - mmdet - INFO - Epoch [1][850/1162]	lr: 2.000e-03, eta: 8:34:14, time: 0.528, data_time: 0.013, memory: 5142, loss_cls: 0.6176, loss_bbox: 0.8406, loss: 1.4582
2021-12-29 09:35:31,799 - mmdet - INFO - Epoch [1][900/1162]	lr: 2.000e-03, eta: 8:33:26, time: 0.532, data_time: 0.013, memory: 5142, loss_cls: 0.6210, loss_bbox: 0.9229, loss: 1.5439
2021-12-29 09:35:58,144 - mmdet - INFO - Epoch [1][950/1162]	lr: 2.000e-03, eta: 8:32:24, time: 0.527, data_time: 0.013, memory: 5142, loss_cls: 1.1693, loss_bbox: 1.1850, loss: 2.3543
2021-12-29 09:36:25,339 - mmdet - INFO - Exp name: tood_r50_fpn_on_input_1x_coco_cloth.py
2021-12-29 09:36:25,340 - mmdet - INFO - Epoch [1][1000/1162]	lr: 2.000e-03, eta: 8:32:14, time: 0.544, data_time: 0.013, memory: 5142, loss_cls: 1.2817, loss_bbox: 1.3174, loss: 2.5991
2021-12-29 09:36:52,114 - mmdet - INFO - Epoch [1][1050/1162]	lr: 2.000e-03, eta: 8:31:39, time: 0.535, data_time: 0.013, memory: 5142, loss_cls: 1.2358, loss_bbox: 1.2847, loss: 2.5205
2021-12-29 09:37:18,908 - mmdet - INFO - Epoch [1][1100/1162]	lr: 2.000e-03, eta: 8:31:07, time: 0.536, data_time: 0.013, memory: 5142, loss_cls: 1.2365, loss_bbox: 1.3173, loss: 2.5538
2021-12-29 09:37:45,867 - mmdet - INFO - Epoch [1][1150/1162]	lr: 2.000e-03, eta: 8:30:43, time: 0.539, data_time: 0.013, memory: 5142, loss_cls: 1.2022, loss_bbox: 1.2296, loss: 2.4319
2021-12-29 09:37:52,329 - mmdet - INFO - Saving checkpoint at 1 epochs
2021-12-29 09:38:47,804 - mmdet - INFO - Evaluating bbox...
2021-12-29 09:38:51,494 - mmdet - INFO - Exp name: tood_r50_fpn_on_input_1x_coco_cloth.py
2021-12-29 09:38:51,495 - mmdet - INFO - Epoch(val) [1][793]	bbox_mAP: 0.0170, bbox_mAP_50: 0.0560, bbox_mAP_75: 0.0090, bbox_mAP_s: -1.0000, bbox_mAP_m: 0.0240, bbox_mAP_l: 0.0190, bbox_mAP_copypaste: 0.017 0.056 0.009 -1.000 0.024 0.019
2021-12-29 09:39:21,128 - mmdet - INFO - Epoch [2][50/1162]	lr: 2.000e-03, eta: 8:27:14, time: 0.592, data_time: 0.062, memory: 5142, loss_cls: 1.2236, loss_bbox: 1.2423, loss: 2.4659
2021-12-29 09:39:47,839 - mmdet - INFO - Epoch [2][100/1162]	lr: 2.000e-03, eta: 8:26:45, time: 0.534, data_time: 0.013, memory: 5142, loss_cls: 1.2410, loss_bbox: 1.2517, loss: 2.4927
2021-12-29 09:40:14,530 - mmdet - INFO - Epoch [2][150/1162]	lr: 2.000e-03, eta: 8:26:16, time: 0.534, data_time: 0.013, memory: 5142, loss_cls: 1.2827, loss_bbox: 1.2900, loss: 2.5726
2021-12-29 09:40:41,392 - mmdet - INFO - Epoch [2][200/1162]	lr: 2.000e-03, eta: 8:25:54, time: 0.537, data_time: 0.013, memory: 5142, loss_cls: 1.2351, loss_bbox: 1.2374, loss: 2.4725
2021-12-29 09:41:08,168 - mmdet - INFO - Epoch [2][250/1162]	lr: 2.000e-03, eta: 8:25:28, time: 0.536, data_time: 0.013, memory: 5142, loss_cls: 1.1736, loss_bbox: 1.1955, loss: 2.3691
2021-12-29 09:41:34,806 - mmdet - INFO - Epoch [2][300/1162]	lr: 2.000e-03, eta: 8:24:57, time: 0.533, data_time: 0.013, memory: 5142, loss_cls: 1.2357, loss_bbox: 1.2372, loss: 2.4729
2021-12-29 09:42:01,528 - mmdet - INFO - Epoch [2][350/1162]	lr: 2.000e-03, eta: 8:24:29, time: 0.534, data_time: 0.013, memory: 5142, loss_cls: 1.2839, loss_bbox: 1.2587, loss: 2.5425
2021-12-29 09:42:28,154 - mmdet - INFO - Epoch [2][400/1162]	lr: 2.000e-03, eta: 8:23:58, time: 0.533, data_time: 0.013, memory: 5142, loss_cls: 1.2595, loss_bbox: 1.2359, loss: 2.4954
2021-12-29 09:42:54,986 - mmdet - INFO - Epoch [2][450/1162]	lr: 2.000e-03, eta: 8:23:35, time: 0.537, data_time: 0.013, memory: 5142, loss_cls: 1.2725, loss_bbox: 1.3049, loss: 2.5773
2021-12-29 09:43:21,637 - mmdet - INFO - Epoch [2][500/1162]	lr: 2.000e-03, eta: 8:23:05, time: 0.533, data_time: 0.013, memory: 5142, loss_cls: 1.2867, loss_bbox: 1.2862, loss: 2.5730
2021-12-29 09:43:48,377 - mmdet - INFO - Epoch [2][550/1162]	lr: 2.000e-03, eta: 8:22:38, time: 0.535, data_time: 0.013, memory: 5142, loss_cls: 1.2554, loss_bbox: 1.2227, loss: 2.4781
2021-12-29 09:44:15,013 - mmdet - INFO - Epoch [2][600/1162]	lr: 2.000e-03, eta: 8:22:08, time: 0.533, data_time: 0.013, memory: 5142, loss_cls: 1.2519, loss_bbox: 1.2955, loss: 2.5474
2021-12-29 09:44:42,014 - mmdet - INFO - Epoch [2][650/1162]	lr: 2.000e-03, eta: 8:21:49, time: 0.540, data_time: 0.013, memory: 5142, loss_cls: 1.2472, loss_bbox: 1.2727, loss: 2.5199
2021-12-29 09:45:08,675 - mmdet - INFO - Epoch [2][700/1162]	lr: 2.000e-03, eta: 8:21:20, time: 0.533, data_time: 0.013, memory: 5142, loss_cls: 1.1740, loss_bbox: 1.2461, loss: 2.4200
2021-12-29 09:45:35,666 - mmdet - INFO - Epoch [2][750/1162]	lr: 2.000e-03, eta: 8:21:00, time: 0.540, data_time: 0.013, memory: 5142, loss_cls: 1.2391, loss_bbox: 1.2960, loss: 2.5351
2021-12-29 09:46:02,395 - mmdet - INFO - Epoch [2][800/1162]	lr: 2.000e-03, eta: 8:20:33, time: 0.535, data_time: 0.013, memory: 5142, loss_cls: 1.2462, loss_bbox: 1.2470, loss: 2.4933
2021-12-29 09:46:29,543 - mmdet - INFO - Epoch [2][850/1162]	lr: 2.000e-03, eta: 8:20:17, time: 0.543, data_time: 0.013, memory: 5142, loss_cls: 1.2525, loss_bbox: 1.3128, loss: 2.5653
2021-12-29 09:46:56,271 - mmdet - INFO - Epoch [2][900/1162]	lr: 2.000e-03, eta: 8:19:50, time: 0.535, data_time: 0.013, memory: 5142, loss_cls: 1.2501, loss_bbox: 1.2733, loss: 2.5234
2021-12-29 09:47:22,898 - mmdet - INFO - Epoch [2][950/1162]	lr: 2.000e-03, eta: 8:19:19, time: 0.533, data_time: 0.013, memory: 5142, loss_cls: 1.3215, loss_bbox: 1.2575, loss: 2.5790

MeteoriteWeny avatar Dec 29 '21 02:12 MeteoriteWeny

i meet the same issue , my code is "area1 = fp16_clamp((bboxes1[..., 2] - bboxes1[..., 0]), min=0) * fp16_clamp(( bboxes1[..., 3] - bboxes1[..., 1]), min=0) " since i clone the code, so i don't have to modify it. but the bug still happens. and it happens randomly each time when i train it.

beeper00 avatar Apr 22 '22 01:04 beeper00