Segfault during push!() if using a large struct in 1.10 due to elsize overflow
I encountered a segfault during pushing a large struct (262144Byte) into a vector multiple (2-10) times. The issue is reproducible. Minimal Example:
#Code to reproduce
struct TestStruct
huge_data1::UInt64
huge_data2::UInt64
huge_data3::UInt64
huge_data4::UInt64
huge_data5::UInt64
huge_data6::UInt64
huge_data7::UInt64
huge_data8::UInt64
end
# Constructors
TestStruct() = TestStruct(0, 0, 0, 0, 0, 0, 0, 0)
struct TestStruct2
huge_data1::TestStruct
huge_data2::TestStruct
huge_data3::TestStruct
huge_data4::TestStruct
huge_data5::TestStruct
huge_data6::TestStruct
huge_data7::TestStruct
huge_data8::TestStruct
end
#constructors
TestStruct2() = TestStruct2(
TestStruct(),
TestStruct(),
TestStruct(),
TestStruct(),
TestStruct(),
TestStruct(),
TestStruct(),
TestStruct()
)
struct TestStruct3
huge_data1::TestStruct2
huge_data2::TestStruct2
huge_data3::TestStruct2
huge_data4::TestStruct2
huge_data5::TestStruct2
huge_data6::TestStruct2
huge_data7::TestStruct2
huge_data8::TestStruct2
end
#constructors
TestStruct3() = TestStruct3(
TestStruct2(),
TestStruct2(),
TestStruct2(),
TestStruct2(),
TestStruct2(),
TestStruct2(),
TestStruct2(),
TestStruct2()
)
struct TestStruct4
huge_data1::TestStruct3
huge_data2::TestStruct3
huge_data3::TestStruct3
huge_data4::TestStruct3
huge_data5::TestStruct3
huge_data6::TestStruct3
huge_data7::TestStruct3
huge_data8::TestStruct3
end
#constructors
TestStruct4() = TestStruct4(
TestStruct3(),
TestStruct3(),
TestStruct3(),
TestStruct3(),
TestStruct3(),
TestStruct3(),
TestStruct3(),
TestStruct3()
)
struct TestStruct5
huge_data1::TestStruct4
huge_data2::TestStruct4
huge_data3::TestStruct4
huge_data4::TestStruct4
huge_data5::TestStruct4
huge_data6::TestStruct4
huge_data7::TestStruct4
huge_data8::TestStruct4
end
#constructors
TestStruct5() = TestStruct5(
TestStruct4(),
TestStruct4(),
TestStruct4(),
TestStruct4(),
TestStruct4(),
TestStruct4(),
TestStruct4(),
TestStruct4()
)
println(sizeof(TestStruct5))
test_vector = Vector{TestStruct5}()
for i = 1:10
println(i)
push!(test_vector, TestStruct5())
end
The issue raises in 1.10.2
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 16 × 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
Threads: 1 default, 0 interactive, 1 GC (on 16 virtual cores)
Environment:
JULIA_GPG = 3673DF529D9049477F76B37566E3C7DC03D6E495
JULIA_VERSION = 1.10.0
JULIA_PATH = /usr/local/julia
It also raises in
Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
It doesn't raise in the nightly build:
Julia Version 1.12.0-DEV.415
Commit aeac2891630 (2024-04-26 03:00 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 16 × 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, tigerlake)
Threads: 1 default, 0 interactive, 1 GC (on 16 virtual cores)
Environment:
JULIA_GPG = 3673DF529D9049477F76B37566E3C7DC03D6E495
JULIA_VERSION = 1.10.0
JULIA_PATH = /usr/local/julia
I added a Backtrace into the gist: https://gist.github.com/peter-zimmer2/9dc84c97685842a3acf15e67ba0261bb
I get no segfault if I push a small struct with 512 Bytes 1e7 times.
Can reproduce on 1.9.4 (occurs if you press enter after running above code) and 1.10.2, I don't see it on 1.11-alpha1, 1.11-beta1 or nightly.
It hangs on 1.6, 1.7 and 1.8 on test_vector = Vector{TestStruct5}(). EDIT: Never mind, looks like it takes while but segfaults similarly.
rr trace for 1.10.2 should be here, https://julialang-dumps.s3.amazonaws.com/reports/2024-04-26T09-14-48-Zentrik.tar.zst
This gets fixed in 909bceae8a60969dfceba4924142e5efee459d47, which is the Memory{T} commit, which is not backportable.
Is the issue is that Array used to store the elsize as a UInt16 https://github.com/JuliaLang/julia/blob/bd47eca2c8aacd145b6c5c02e47e2b9ec27ab456/src/julia.h#L192 so if you had an array with larger elements it overflows?
So the constructor should error in v1.10- it seems? We could implement that on the backports branch
Yeah this is gets a elsize of 0 here which causes all kinds of issues.
The ideal fix would be for it to allocate as a ptr array, but that would likely be harder.
Also an option. Or backport the layout change. But doesn't seem worthwhile for a previous version only in a particular edge case