oneflow
oneflow copied to clipboard
[bug][graph] Segfault when reading a registered buffer via module._buffers["0"] inside forward (eager OK)
Summary
Accessing a buffer that was registered with the name "0" (a numeric string) through module._buffers["0"] inside forward() works in eager mode, but segfaults in Graph mode during execution of functional::Add/AddN. The crash happens even in an otherwise trivial model (single Linear + add buffer).
Code to reproduce bug
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "3" # or any single GPU
import oneflow as flow
import oneflow.nn as nn
# ---- Minimal BufferList: registers a single buffer under key "0" ----
class BufferListTemplate(nn.Module):
def __init__(self, *buffers):
super().__init__()
assert len(buffers) == 1, "MRE only needs one buffer"
# Critical: register under the name "0" (numeric string)
self.register_buffer("0", buffers[0])
def __getitem__(self, idx: int):
# Not used in this MRE; present to show a 'normal' accessor
return getattr(self, str(idx))
# ---- Model: Linear + fetch buffer from _buffers["0"] and add ----
class MyModel(nn.Module):
def __init__(self):
super().__init__()
self.fc = nn.Linear(10, 10)
self.buflist = BufferListTemplate(flow.randn(10)) # float32
def forward(self, x):
y = self.fc(x)
# Trigger: directly read from the internal _buffers dict
buf = self.buflist._buffers["0"]
return y + buf # hits functional::Add/AddN in Graph
# ---- Graph wrapper ----
class G(nn.Graph):
def __init__(self, m):
super().__init__()
self.m = m
def build(self, x):
return self.m(x)
def main():
flow.manual_seed(0)
model = MyModel()
x = flow.randn(2, 10)
print("------------------------")
# Eager: usually OK
out_eager = model(x)
print("------------------------")
# Graph: typically segfaults here
g = G(model)
out_graph = g(x) # <-- crash expected
print("------------------------")
# If it doesn't crash, compare numerics (rarely reached)
import numpy as np
np.testing.assert_allclose(out_eager.numpy(), out_graph.numpy(), rtol=1e-3, atol=1e-2)
if __name__ == "__main__":
main()
Observed Output (abridged)
------------------------
------------------------
Stack trace (most recent call last):
... oneflow/_oneflow_internal.cpython-310-...so
... functional::add(...)
... functional::Add(TensorTuple const&, bool)
... functional::impl::AddNFunctor::operator()
... OpInterpUtil::Dispatch(...)
Segmentation fault (Address not mapped to object [0x61])
Segmentation fault (core dumped)
System Information
- OS: Ubuntu 22.04.4 LTS (x86_64)
- OneFlow version : 1.0.0.dev20250921+cpu
- Python version: 3.10.16