vision Runaway mask_loss for MaskRCNN when using non-binary mask.

🐛 Describe the bug

While developing my own custom data pipeline for MaskRCNN, I encountered a bug where I would generate a massive runaway mask)_loss (both positive and negative). According to the official documentation, targets["mask"] should be a binary mask for each instance. Upon further inspection, my masks were in fact not binary (due my application of linear interpolation to resize these masks for augmentation purposes).

While I was able to identify this issue, I noticed that several other users encountered the same issue <1, 2>.

In roi_heads, I discovered that there was no error checking for this behavior. I am therefore opening this bug report as a way to track this for a pull request I am prototyping.

`import torch from torchvision.models.detection import maskrcnn_resnet50_fpn_v2

mask_rcnn = maskrcnn_resnet50_fpn_v2(weights='DEFAULT') mask_rcnn.to('cuda'); mask_rcnn.train();

inputs = torch.randn(1, 3, 800, 800).float().to('cuda') targets = [{ 'masks': torch.randint(0, 10, (5, 800, 800), dtype=torch.uint8).to('cuda'), # this is NOT a binary mask 'boxes': torch.tensor([ [103., 137., 262., 236.], [83., 393., 250., 494.], [281., 389., 441., 487.], [289., 134., 447., 231.], [202., 264., 355., 362.] ], dtype=torch.float32).to('cuda'), 'labels': torch.tensor([1, 2, 3, 4, 5], dtype=torch.int64).to('cuda'), }] losses = mask_rcnn(inputs, targets) # this should throw an error since the 'masks' is not a binary mask print(losses['loss_mask'])`

Versions

Collecting environment information... PyTorch version: 2.4.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: Could not collect CMake version: Could not collect Libc version: glibc-2.35

Python version: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0] (64-bit runtime) Python platform: Linux-5.10.220-209.869.amzn2.x86_64-x86_64-with-glibc2.35 Is CUDA available: True CUDA runtime version: Could not collect CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: Tesla T4 Nvidia driver version: 535.183.01 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz CPU family: 6 Model: 85 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 Stepping: 7 BogoMIPS: 4999.99 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni Hypervisor vendor: KVM Virtualization type: full L1d cache: 64 KiB (2 instances) L1i cache: 64 KiB (2 instances) L2 cache: 2 MiB (2 instances) L3 cache: 35.8 MiB (1 instance) NUMA node(s): 1 NUMA node0 CPU(s): 0-3 Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status

Aug 08 '24 19:08 KBlansit

The question I have is would the following be sufficient for a pull request:

Link to start of where I would modify roi_heads

gt_masks = [t["masks"] for t in targets]
for gt_mask in gt_masks:
    only_binary_values = ~((unique(gt_mask) != 0) & (unique(gt_mask) != 1)).any()
    _assert(has_non_binary_value, 'a target["mask"] contains a non-binary value.')
gt_labels = [t["labels"] for t in targets]
rcnn_loss_mask = maskrcnn_loss(mask_logits, mask_proposals, gt_masks, gt_labels, pos_matched_idxs)
loss_mask = {"loss_mask": rcnn_loss_mask}

Aug 08 '24 19:08 KBlansit

Hi @KBlansit

Unfortunately such input check can end up being quite expensive, and they would unnecessarily slow down user code that is already correct.

One thing we can proabably do (if not already done) is to assert that the mask dtype is uint8. This would be an O(1) check instead of O(n). If there are ways to make the documentation clearer, I'm also happy to consider a PR on that front.

Aug 12 '24 10:08 NicolasHug