oneflow icon indicating copy to clipboard operation
oneflow copied to clipboard

[bug] MaxUnpool2d with invalid indices (-1) crashes process instead of raising Python error

Open tinywisdom opened this issue 2 months ago • 0 comments

Summary

When nn.MaxUnpool2d is called with invalid indices (e.g., all values set to -1), OneFlow does not raise a Python error. Instead, the process aborts due to a C++ CHECKfailure.

Code to reproduce bug

import oneflow as flow
import oneflow.nn as nn
import numpy as np

device = "cpu"
flow.manual_seed(0)
np.random.seed(0)

class M(nn.Module):
    def __init__(self):
        super().__init__()
        self.unpool = nn.MaxUnpool2d(kernel_size=2)

    def forward(self, x, indices):
        return self.unpool(x, indices)

def main():
    m = M().to(device)
    x = flow.tensor(np.random.rand(1, 1, 2, 2), dtype=flow.float32, device=device)

    # Invalid indices: all set to -1
    bad_idx = flow.full_like(x.to(flow.int64), -1)

    print("about to call unpool with invalid indices = -1")
    y = m(x, bad_idx)

    # Force sync to trigger backend error
    print("forcing sync via .numpy() ...")
    _ = y.numpy()

if __name__ == "__main__":
    main()

Output

forcing sync via .numpy() ...
terminate called after throwing an instance of 'oneflow::Exception'
  what():  Check failed: (idx >= 0 && idx < out_elem_num) Found an invalid max index: -1, output volumes are of size 16
  File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in 
  File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in 
  File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in vm::ThreadCtx::TryReceiveAndRun()
  File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in vm::EpStreamPolicyBase::Run(vm::Instruction*) const
  File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in vm::Instruction::Compute()
  File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in vm::OpCallInstructionPolicy::Compute(vm::Instruction*)
  File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in 
  File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in vm::OpCallInstructionUtil::Compute(vm::OpCallInstructionPolicy*, vm::Stream*, bool, bool)
  File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in StatefulOpKernel::Compute(eager::CallContext*, ep::Stream*, user_op::OpKernel const*, user_op::OpKernelState*, user_op::OpKernelCache const*) const
  File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in MaxUnpoolNdKernel<(DeviceType)1, float>::Compute(user_op::KernelComputeContext*) const
  File "oneflow/user/kernels/max_unpool_kernel.cpp", line 32, in MaxUnpoolNdForwardOrBackward
    CHECK_OR_THROW(idx >= 0 && idx < out_elem_num)
Error Type: oneflow.ErrorProto.check_failed_error
Stack trace (most recent call last) in thread 1921972:
   Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x75a803359c17, in 
   Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x75a80335942c, in 
   Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x75a803354ca8, in vm::ThreadCtx::TryReceiveAndRun()
   Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x75a8032f7394, in vm::EpStreamPolicyBase::Run(vm::Instruction*) const
   Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x75a8032fa777, in vm::Instruction::Compute()
   Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x75a8032ffb38, in vm::OpCallInstructionPolicy::Compute(vm::Instruction*)
   Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x75a8032ff2ec, in 
   Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x75a80330419f, in vm::OpCallInstructionUtil::Compute(vm::OpCallInstructionPolicy*, vm::Stream*, bool, bool)
   Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x75a803fa2ca9, in StatefulOpKernel::Compute(eager::CallContext*, ep::Stream*, user_op::OpKernel const*, user_op::OpKernelState*, user_op::OpKernelCache const*) const
   Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x75a803aa1adb, in MaxUnpoolNdKernel<(DeviceType)1, float>::Compute(user_op::KernelComputeContext*) const
   Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x75a803a98637, in 
   Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x75a803a9541b, in 
   Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x75a7ffaa277d, in 

Aborted (Signal sent by tkill() 1921507 1002)
Aborted (core dumped)

System Information

  • OS: Ubuntu 22.04.4 LTS (x86_64)
  • OneFlow version : 1.0.0.dev20250921+cpu
  • Python version: 3.10.16

tinywisdom avatar Oct 01 '25 13:10 tinywisdom