oneflow icon indicating copy to clipboard operation
oneflow copied to clipboard

[bug] CrossEntropyLoss with wrong target shape aborts process (C++ CHECK) instead of raising Python error

Open tinywisdom opened this issue 2 months ago • 0 comments

Summary

When using nn.CrossEntropyLoss, if the target tensor has the wrong shape (e.g., [N, C] instead of [N]), OneFlow does not raise a clear Python exception.

Code to reproduce bug

import oneflow as flow
import oneflow.nn as nn

flow.manual_seed(0)

m = nn.Linear(10, 5)           # logits: [32, 5]
x = flow.randn(32, 10)

# ❌ Wrong target shape: should be [32], not [32, 10]
bad_targets = flow.randint(0, 5, x.shape)  

ce = nn.CrossEntropyLoss()
loss = ce(m(x), bad_targets)
print(loss)

Output

terminate called after throwing an instance of 'oneflow::Exception'
  what():  Check failed: (is_initialized()) 
  File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in 
  File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in 
  File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in vm::ThreadCtx::TryReceiveAndRun()
  File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in vm::EpStreamPolicyBase::Run(vm::Instruction*) const
  File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in vm::Instruction::Compute()
  File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in vm::OpCallInstructionPolicy::Compute(vm::Instruction*)
  File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in 
  File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in vm::OpCallInstructionUtil::Compute(vm::OpCallInstructionPolicy*, vm::Stream*, bool, bool)
  File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in vm::OpCallInstructionUtil::AllocateOutputBlobsMemory(vm::OpCallInstructionPolicy*, vm::Allocator*, vm::Stream const*)
  File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in vm::EagerBlobObject::TryAllocateBlobBodyMemory(vm::Allocator*)
  File "oneflow/core/common/shape.h", line 140, in NumAxes
    CHECK_OR_THROW(is_initialized())
Error Type: oneflow.ErrorProto.check_failed_error
Stack trace (most recent call last) in thread 1923492:
   Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x78ab52759c17, in 
   Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x78ab5275942c, in 
   Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x78ab52754ca8, in vm::ThreadCtx::TryReceiveAndRun()
   Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x78ab526f7394, in vm::EpStreamPolicyBase::Run(vm::Instruction*) const
   Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x78ab526fa777, in vm::Instruction::Compute()
   Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x78ab526ffb38, in vm::OpCallInstructionPolicy::Compute(vm::Instruction*)
   Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x78ab526ff2ec, in 
   Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x78ab527040fe, in vm::OpCallInstructionUtil::Compute(vm::OpCallInstructionPolicy*, vm::Stream*, bool, bool)
   Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x78ab527026cc, in vm::OpCallInstructionUtil::AllocateOutputBlobsMemory(vm::OpCallInstructionPolicy*, vm::Allocator*, vm::Stream const*)
   Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x78ab5025ced1, in vm::EagerBlobObject::TryAllocateBlobBodyMemory(vm::Allocator*)
   Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x78ab5025f655, in vm::EagerBlobObject::ByteSizeOfBlobBody() const
   Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x78ab5025a08a, in 
   Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x78ab4eea277d, in 

Aborted (Signal sent by tkill() 1923235 1002)
Aborted (core dumped)

System Information

  • OS: Ubuntu 22.04.4 LTS (x86_64)
  • OneFlow version : 1.0.0.dev20250921+cpu
  • Python version: 3.10.16

tinywisdom avatar Oct 01 '25 13:10 tinywisdom