julia icon indicating copy to clipboard operation
julia copied to clipboard

Segfault during push!() if using a large struct in 1.10 due to elsize overflow

Open peter-zimmer2 opened this issue 1 year ago • 7 comments

I encountered a segfault during pushing a large struct (262144Byte) into a vector multiple (2-10) times. The issue is reproducible. Minimal Example:

#Code to reproduce
struct TestStruct
    huge_data1::UInt64
    huge_data2::UInt64
    huge_data3::UInt64
    huge_data4::UInt64
    huge_data5::UInt64
    huge_data6::UInt64
    huge_data7::UInt64
    huge_data8::UInt64
end
# Constructors
TestStruct() = TestStruct(0, 0, 0, 0, 0, 0, 0, 0)

struct TestStruct2
    huge_data1::TestStruct
    huge_data2::TestStruct
    huge_data3::TestStruct
    huge_data4::TestStruct
    huge_data5::TestStruct
    huge_data6::TestStruct
    huge_data7::TestStruct
    huge_data8::TestStruct
end
#constructors
TestStruct2() = TestStruct2(
    TestStruct(),
    TestStruct(),
    TestStruct(),
    TestStruct(),
    TestStruct(),
    TestStruct(),
    TestStruct(),
    TestStruct()
)

struct TestStruct3
    huge_data1::TestStruct2
    huge_data2::TestStruct2
    huge_data3::TestStruct2
    huge_data4::TestStruct2
    huge_data5::TestStruct2
    huge_data6::TestStruct2
    huge_data7::TestStruct2
    huge_data8::TestStruct2
end
#constructors
TestStruct3() = TestStruct3(
    TestStruct2(),
    TestStruct2(),
    TestStruct2(),
    TestStruct2(),
    TestStruct2(),
    TestStruct2(),
    TestStruct2(),
    TestStruct2()
)

struct TestStruct4
    huge_data1::TestStruct3
    huge_data2::TestStruct3
    huge_data3::TestStruct3
    huge_data4::TestStruct3
    huge_data5::TestStruct3
    huge_data6::TestStruct3
    huge_data7::TestStruct3
    huge_data8::TestStruct3
end
#constructors
TestStruct4() = TestStruct4(
    TestStruct3(),
    TestStruct3(),
    TestStruct3(),
    TestStruct3(),
    TestStruct3(),
    TestStruct3(),
    TestStruct3(),
    TestStruct3()
)

struct TestStruct5
    huge_data1::TestStruct4
    huge_data2::TestStruct4
    huge_data3::TestStruct4
    huge_data4::TestStruct4
    huge_data5::TestStruct4
    huge_data6::TestStruct4
    huge_data7::TestStruct4
    huge_data8::TestStruct4
end
#constructors
TestStruct5() = TestStruct5(
    TestStruct4(),
    TestStruct4(),
    TestStruct4(),
    TestStruct4(),
    TestStruct4(),
    TestStruct4(),
    TestStruct4(),
    TestStruct4()
)

println(sizeof(TestStruct5))

test_vector = Vector{TestStruct5}()
for i = 1:10
    println(i)
    push!(test_vector, TestStruct5())
end

The issue raises in 1.10.2

Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
Threads: 1 default, 0 interactive, 1 GC (on 16 virtual cores)
Environment:
  JULIA_GPG = 3673DF529D9049477F76B37566E3C7DC03D6E495
  JULIA_VERSION = 1.10.0
  JULIA_PATH = /usr/local/julia

It also raises in

Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)

It doesn't raise in the nightly build:

Julia Version 1.12.0-DEV.415
Commit aeac2891630 (2024-04-26 03:00 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, tigerlake)
Threads: 1 default, 0 interactive, 1 GC (on 16 virtual cores)
Environment:
  JULIA_GPG = 3673DF529D9049477F76B37566E3C7DC03D6E495
  JULIA_VERSION = 1.10.0
  JULIA_PATH = /usr/local/julia

I added a Backtrace into the gist: https://gist.github.com/peter-zimmer2/9dc84c97685842a3acf15e67ba0261bb

I get no segfault if I push a small struct with 512 Bytes 1e7 times.

peter-zimmer2 avatar Apr 26 '24 06:04 peter-zimmer2

Can reproduce on 1.9.4 (occurs if you press enter after running above code) and 1.10.2, I don't see it on 1.11-alpha1, 1.11-beta1 or nightly. It hangs on 1.6, 1.7 and 1.8 on test_vector = Vector{TestStruct5}(). EDIT: Never mind, looks like it takes while but segfaults similarly.

rr trace for 1.10.2 should be here, https://julialang-dumps.s3.amazonaws.com/reports/2024-04-26T09-14-48-Zentrik.tar.zst

Zentrik avatar Apr 26 '24 09:04 Zentrik

This gets fixed in 909bceae8a60969dfceba4924142e5efee459d47, which is the Memory{T} commit, which is not backportable.

gbaraldi avatar Apr 26 '24 15:04 gbaraldi

Is the issue is that Array used to store the elsize as a UInt16 https://github.com/JuliaLang/julia/blob/bd47eca2c8aacd145b6c5c02e47e2b9ec27ab456/src/julia.h#L192 so if you had an array with larger elements it overflows?

oscardssmith avatar Apr 26 '24 15:04 oscardssmith

So the constructor should error in v1.10- it seems? We could implement that on the backports branch

vtjnash avatar Apr 26 '24 16:04 vtjnash

Yeah this is gets a elsize of 0 here which causes all kinds of issues.

gbaraldi avatar Apr 26 '24 16:04 gbaraldi

The ideal fix would be for it to allocate as a ptr array, but that would likely be harder.

oscardssmith avatar Apr 26 '24 16:04 oscardssmith

Also an option. Or backport the layout change. But doesn't seem worthwhile for a previous version only in a particular edge case

vtjnash avatar Apr 26 '24 16:04 vtjnash