mmtracking icon indicating copy to clipboard operation
mmtracking copied to clipboard

RuntimeError: expected scalar type Float but found Half

Open ZihaoZhao opened this issue 2 years ago • 3 comments

Describe the bug RuntimeError: expected scalar type Float but found Half

Reproduction

python tools/test.py configs/mot/bytetrack/bytetrack_yolox_x_crowdhuman_mot17-private.py --eval track
  1. Did you make any modifications on the code or config? Did you understand what you have modified?

No modifications.

  1. What dataset did you use and what task did you run?

MOT17

Environment

  1. Please run python mmtrack/utils/collect_env.py to collect necessary environment information and paste it here.

sys.platform: linux Python: 3.7.11 (default, Jul 27 2021, 14:32:16) [GCC 7.5.0] CUDA available: True GPU 0: GeForce RTX 2080 Ti CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 10.1, V10.1.243 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.5.0 PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 10.1
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  • CuDNN 7.6.3
  • Magma 2.5.2
  • Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.6.0a0+82fd1c8 OpenCV: 4.5.4-dev MMCV: 1.3.17 MMCV Compiler: GCC 7.5 MMCV CUDA Compiler: 10.1 MMTracking: 0.9.0+142922e

  1. You may add addition that may be helpful for locating the problem, such as
    • How you installed PyTorch [e.g., pip, conda, source]
    • Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback If applicable, paste the error trackback here.

2022-01-27 10:32:39,679 - mmtrack - INFO - 
detector.bbox_head.multi_level_conv_obj.2.bias - torch.Size([1]): 
PretrainedInit: load from https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_x_8x8_300e_coco/yolox_x_8x8_300e_coco_20211126_140254-1ef88d67.pth 
 
[                                                  ] 0/17757, elapsed: 0s, ETA:Traceback (most recent call last):
  File "tools/test.py", line 224, in <module>
    main()
  File "tools/test.py", line 187, in main
    show_score_thr=args.show_score_thr)
  File "/zhzhao/code/mmtracking/mmtrack/apis/test.py", line 48, in single_gpu_test
    result = model(return_loss=False, rescale=True, **data)
  File "/zhzhao/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/zhzhao/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 42, in forward
    return super().forward(*inputs, **kwargs)
  File "/zhzhao/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/zhzhao/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/zhzhao/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 130, in new_func
    output = old_func(*new_args, **new_kwargs)
  File "/zhzhao/code/mmtracking/mmtrack/models/mot/base.py", line 136, in forward
    return self.forward_test(img, img_metas, **kwargs)
  File "/zhzhao/code/mmtracking/mmtrack/models/mot/base.py", line 113, in forward_test
    return self.simple_test(imgs[0], img_metas[0], **kwargs)
  File "/zhzhao/code/mmtracking/mmtrack/models/mot/byte_track.py", line 66, in simple_test
    img, img_metas, rescale=rescale)
  File "/zhzhao/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmdet/models/detectors/single_stage.py", line 103, in simple_test
    feat, img_metas, rescale=rescale)
  File "/zhzhao/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmdet/models/dense_heads/base_dense_head.py", line 360, in simple_test
    return self.simple_test_bboxes(feats, img_metas, rescale=rescale)
  File "/zhzhao/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmdet/models/dense_heads/dense_test_mixins.py", line 38, in simple_test_bboxes
    *outs, img_metas=img_metas, rescale=rescale)
  File "/zhzhao/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmdet/models/dense_heads/yolox_head.py", line 294, in get_bboxes
    self._bboxes_nms(cls_scores, bboxes, score_factor, cfg))
  File "/zhzhao/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmdet/models/dense_heads/yolox_head.py", line 321, in _bboxes_nms
    dets, keep = batched_nms(bboxes, scores, labels, cfg.nms)
  File "/zhzhao/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/ops/nms.py", line 307, in batched_nms
    dets, keep = nms_op(boxes_for_nms, scores, **nms_cfg_)
  File "/zhzhao/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/utils/misc.py", line 340, in new_func
    output = old_func(*args, **kwargs)
  File "/zhzhao/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/ops/nms.py", line 172, in nms
    score_threshold, max_num)
  File "/zhzhao/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/ops/nms.py", line 27, in forward
    bboxes, scores, iou_threshold=float(iou_threshold), offset=offset)
RuntimeError: expected scalar type Float but found Half

ZihaoZhao avatar Jan 27 '22 10:01 ZihaoZhao

I am having the same problem running the following command python tools/test.py configs/mot/bytetrack/bytetrack_yolox_x_crowdhuman_mot17-private-half.py --checkpoint checkpoints/bytetrack_yolox_x_crowdhuman_mot17-private-half_20211218_205500-1985c9f0.pth --out results_mot.pkl --eval track

corrado113 avatar Feb 09 '22 00:02 corrado113

Hi @ZihaoZhao @corrado113 ,

I tried to reproduce your issue. But the same command runs smoothly on my machine with: CUDA: 11.2 torch: 1.6.0 torchvision: 0.7.0

This issue is usually related to mixed precision weights. Considering you didn't modify the original code and there is no use of lib such as apex, I would suspect the reason lies in your pytorch version (1.5.0). Because Pytorch starts to support automatic mixed precision torch.cuda.amp in 1.6+ version (see this doc for details).

So, I recommend you to upgrade pytorch to 1.6+ and try again. Or, a way to manually work it around is to find the variable x causing the error and do the conversion x = x.float() before calling the function which raises the error.

noahcao avatar May 06 '22 06:05 noahcao

This error occurs when the 2 matrices you are multiplying are not of same dtype.

Half means dtype = torch.float16 while, Float means dtype = torch.float32

to resolve the error simply cast your model weights into float32

for param in model.parameters():
    # Check if parameter dtype is  Half (float16)
    if param.dtype == torch.float16:
        param.data = param.data.to(torch.float32)

RishitToteja avatar Jun 19 '23 06:06 RishitToteja