dinov2 icon indicating copy to clipboard operation
dinov2 copied to clipboard

RuntimeError: Input type (c10::Half) and bias type (float) should be the same

Open zshn25 opened this issue 1 year ago • 3 comments

Error when trying to run training

  File "/dinov2/dinov2/layers/patch_embed.py", line 75, in forward
    x = self.proj(x)  # B C H W
  File "/dinov2/dinov2/models/vision_transformer.py", line 211, in prepare_tokens_with_masks
    x = self.patch_embed(x)
  File "/dinov2/dinov2/models/vision_transformer.py", line 254, in forward_features
    x = self.prepare_tokens_with_masks(x, masks)
  File "/dinov2/dinov2/models/vision_transformer.py", line 321, in forward
    ret = self.forward_features(*args, **kwargs)
  File "/dinov2/dinov2/train/ssl_meta_arch.py", line 160, in get_teacher_output
    teacher_backbone_output_dict = self.teacher.backbone(x, is_training=True)
  File "/dinov2/dinov2/train/ssl_meta_arch.py", line 229, in forward_backward
    teacher_dino_softmaxed_centered_list, masked_teacher_ibot_softmaxed_centered = get_teacher_output()
  File "/dinov2/dinov2/train/train.py", line 246, in do_train
    loss_dict = model.forward_backward(data, teacher_temp=teacher_temp)
  File "/dinov2/dinov2/train/train.py", line 314, in main
    do_train(cfg, model, resume=not args.no_resume)
  File "/dinov2/dinov2/run/train/train.py", line 29, in __call__
    train_main(self.args)
  File "/dinov2/dinov2/run/train/train.py", line 60, in main
    t()
  File "/dinov2/dinov2/run/train/train.py", line 65, in <module>
    sys.exit(main())
RuntimeError: Input type (c10::Half) and bias type (float) should be the same

Setup exactly as mentioned in README.

zshn25 avatar Nov 21 '23 14:11 zshn25

self.proj was at different dtype as the input. I added self.proj.to(x) at this line and this resolved https://github.com/facebookresearch/dinov2/blob/da4b3825f0ed64b7398ace00c5062503811d0cff/dinov2/layers/patch_embed.py#L75 but I now get another RuntimeError

Expected output.scalar_type() == at::ScalarType::Half to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)
  File "/home/zsuri/prototyping_dinov2/dinov2/layers/block.py", line 181, in get_attn_bias_and_cat
    cat_tensors = index_select_cat([x.flatten(1) for x in x_list], branges).view(1, -1, x_list[0].shape[-1])
  File "/home/zsuri/prototyping_dinov2/dinov2/layers/block.py", line 201, in drop_add_residual_stochastic_depth_list
    attn_bias, x_cat = get_attn_bias_and_cat(x_list, branges)
  File "/home/zsuri/prototyping_dinov2/dinov2/layers/block.py", line 227, in forward_nested
    x_list = drop_add_residual_stochastic_depth_list(
  File "/home/zsuri/prototyping_dinov2/dinov2/layers/block.py", line 259, in forward
    return self.forward_nested(x_or_x_list)
  File "/home/zsuri/prototyping_dinov2/dinov2/models/vision_transformer.py", line 40, in forward
    x = b(x)
  File "/home/zsuri/prototyping_dinov2/dinov2/models/vision_transformer.py", line 241, in forward_features_list
    x = blk(x)
  File "/home/zsuri/prototyping_dinov2/dinov2/models/vision_transformer.py", line 260, in forward_features
    return self.forward_features_list(x, masks)
  File "/home/zsuri/prototyping_dinov2/dinov2/models/vision_transformer.py", line 329, in forward
    ret = self.forward_features(*args, **kwargs)
  File "/home/zsuri/prototyping_dinov2/dinov2/train/ssl_meta_arch.py", line 235, in forward_backward
    student_global_backbone_output_dict, student_local_backbone_output_dict = self.student.backbone(
  File "/home/zsuri/prototyping_dinov2/dinov2/train/train.py", line 246, in do_train
    loss_dict = model.forward_backward(data, teacher_temp=teacher_temp)
  File "/home/zsuri/prototyping_dinov2/dinov2/train/train.py", line 314, in main
    do_train(cfg, model, resume=not args.no_resume)
  File "/home/zsuri/prototyping_dinov2/dinov2/run/train/train.py", line 29, in __call__
    train_main(self.args)
  File "/home/zsuri/prototyping_dinov2/dinov2/run/train/train.py", line 60, in main
    t()
  File "/home/zsuri/prototyping_dinov2/dinov2/run/train/train.py", line 65, in <module>
    sys.exit(main())
RuntimeError: Expected output.scalar_type() == at::ScalarType::Half to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)
```

zshn25 avatar Nov 21 '23 15:11 zshn25

did you cast the model to .half() ?

qasfb avatar Nov 23 '23 17:11 qasfb

@qasfb, I had to manually cast particular layers to same dtype as the input in multiple places

zshn25 avatar Nov 24 '23 13:11 zshn25