oneflow
oneflow copied to clipboard
[bug] CrossEntropyLoss with wrong target shape aborts process (C++ CHECK) instead of raising Python error
Summary
When using nn.CrossEntropyLoss, if the target tensor has the wrong shape (e.g., [N, C] instead of [N]), OneFlow does not raise a clear Python exception.
Code to reproduce bug
import oneflow as flow
import oneflow.nn as nn
flow.manual_seed(0)
m = nn.Linear(10, 5) # logits: [32, 5]
x = flow.randn(32, 10)
# ❌ Wrong target shape: should be [32], not [32, 10]
bad_targets = flow.randint(0, 5, x.shape)
ce = nn.CrossEntropyLoss()
loss = ce(m(x), bad_targets)
print(loss)
Output
terminate called after throwing an instance of 'oneflow::Exception'
what(): Check failed: (is_initialized())
File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in
File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in
File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in vm::ThreadCtx::TryReceiveAndRun()
File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in vm::EpStreamPolicyBase::Run(vm::Instruction*) const
File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in vm::Instruction::Compute()
File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in vm::OpCallInstructionPolicy::Compute(vm::Instruction*)
File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in
File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in vm::OpCallInstructionUtil::Compute(vm::OpCallInstructionPolicy*, vm::Stream*, bool, bool)
File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in vm::OpCallInstructionUtil::AllocateOutputBlobsMemory(vm::OpCallInstructionPolicy*, vm::Allocator*, vm::Stream const*)
File "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", line <unknown>, in vm::EagerBlobObject::TryAllocateBlobBodyMemory(vm::Allocator*)
File "oneflow/core/common/shape.h", line 140, in NumAxes
CHECK_OR_THROW(is_initialized())
Error Type: oneflow.ErrorProto.check_failed_error
Stack trace (most recent call last) in thread 1923492:
Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x78ab52759c17, in
Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x78ab5275942c, in
Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x78ab52754ca8, in vm::ThreadCtx::TryReceiveAndRun()
Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x78ab526f7394, in vm::EpStreamPolicyBase::Run(vm::Instruction*) const
Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x78ab526fa777, in vm::Instruction::Compute()
Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x78ab526ffb38, in vm::OpCallInstructionPolicy::Compute(vm::Instruction*)
Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x78ab526ff2ec, in
Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x78ab527040fe, in vm::OpCallInstructionUtil::Compute(vm::OpCallInstructionPolicy*, vm::Stream*, bool, bool)
Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x78ab527026cc, in vm::OpCallInstructionUtil::AllocateOutputBlobsMemory(vm::OpCallInstructionPolicy*, vm::Allocator*, vm::Stream const*)
Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x78ab5025ced1, in vm::EagerBlobObject::TryAllocateBlobBodyMemory(vm::Allocator*)
Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x78ab5025f655, in vm::EagerBlobObject::ByteSizeOfBlobBody() const
Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x78ab5025a08a, in
Object "<pytorch_source>/oneflow/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-b64d744b.so", at 0x78ab4eea277d, in
Aborted (Signal sent by tkill() 1923235 1002)
Aborted (core dumped)
System Information
- OS: Ubuntu 22.04.4 LTS (x86_64)
- OneFlow version : 1.0.0.dev20250921+cpu
- Python version: 3.10.16