Segfault using Julia 1.11-alpha2 on AMD EPYC 9554
Running
using BFloat16s # v0.5
A = ones(BFloat16, 10)
A + A
sometimes leads to a segfault, sometimes a stack overflow, and sometimes one CPU sits at 100% until ^Ced.
Nothing breaks on my Intel Core i5-12600K that does not support avx512_bf16.
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.11.0-alpha2 (2024-03-18)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> include("../mwe.jl")
ERROR: LoadError: StackOverflowError:
in expression starting at /home/jschulze/tmp/julia-bfloat16/mwe.jl:4
julia> include("../mwe.jl")
ERROR: LoadError: StackOverflowError:
in expression starting at /home/jschulze/tmp/julia-bfloat16/mwe.jl:4
julia> include("../mwe.jl")
ERROR: LoadError: StackOverflowError:
in expression starting at /home/jschulze/tmp/julia-bfloat16/mwe.jl:4
julia>
jschulze@hostname:~/tmp/julia-bfloat16/v0.5.0$ julia +1.11 ../mwe.jl
ERROR: LoadError: StackOverflowError:
in expression starting at /home/jschulze/tmp/julia-bfloat16/mwe.jl:4
jschulze@hostname:~/tmp/julia-bfloat16/v0.5.0$ julia +1.11 ../mwe.jl
Segmentation fault (core dumped)
jschulze@hostname:~/tmp/julia-bfloat16/v0.5.0$ julia +1.11 ../mwe.jl
^C
[207628] signal 2: Interrupt
in expression starting at none:0
_ZN4llvm8ExpectedINS_8ArrayRefINS_6object12Elf_Sym_ImplINS2_7ELFTypeILNS_7support10endiannessE1ELb1EEEEEEEED2Ev at /home/jschulze/.julia/juliaup/julia-1.11.0-alpha2+0.x64.linux.gnu/bin/../lib/julia/libLLVM-16jl.so (unknown line)
_ZNK4llvm6object13ELFObjectFileINS0_7ELFTypeILNS_7support10endiannessE1ELb1EEEE14getSymbolFlagsENS0_11DataRefImplE at /home/jschulze/.julia/juliaup/julia-1.11.0-alpha2+0.x64.linux.gnu/bin/../lib/julia/libLLVM-16jl.so (unknown line)
_ZNK4llvm6object10ObjectFile14getSymbolValueENS0_11DataRefImplE at /home/jschulze/.julia/juliaup/julia-1.11.0-alpha2+0.x64.linux.gnu/bin/../lib/julia/libLLVM-16jl.so (unknown line)
_ZNK4llvm6object13ELFObjectFileINS0_7ELFTypeILNS_7support10endiannessE1ELb1EEEE16getSymbolAddressENS0_11DataRefImplE at /home/jschulze/.julia/juliaup/julia-1.11.0-alpha2+0.x64.linux.gnu/bin/../lib/julia/libLLVM-16jl.so (unknown line)
getAddress at /cache/build/builder-amdci4-1/julialang/julia-release-1-dot-11/usr/include/llvm/Object/ObjectFile.h:408 [inlined]
get_function_name_and_base at /cache/build/builder-amdci4-1/julialang/julia-release-1-dot-11/src/debuginfo.cpp:746 [inlined]
jl_dylib_DI_for_fptr at /cache/build/builder-amdci4-1/julialang/julia-release-1-dot-11/src/debuginfo.cpp:1142
jl_getDylibFunctionInfo at /cache/build/builder-amdci4-1/julialang/julia-release-1-dot-11/src/debuginfo.cpp:1174 [inlined]
jl_getFunctionInfo_impl at /cache/build/builder-amdci4-1/julialang/julia-release-1-dot-11/src/debuginfo.cpp:1247
ijl_lookup_code_address at /cache/build/builder-amdci4-1/julialang/julia-release-1-dot-11/src/stackwalk.c:589
lookup at ./stacktraces.jl:108
stacktrace at ./stacktraces.jl:164
stacktrace at ./stacktraces.jl:162 [inlined]
scrub_repl_backtrace at ./client.jl:96
jfptr_scrub_repl_backtrace_70894.1 at /home/jschulze/.julia/juliaup/julia-1.11.0-alpha2+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
scrub_repl_backtrace at ./client.jl:103
exec_options at ./client.jl:321
_start at ./client.jl:526
jfptr__start_71122.1 at /home/jschulze/.julia/juliaup/julia-1.11.0-alpha2+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
jl_apply at /cache/build/builder-amdci4-1/julialang/julia-release-1-dot-11/src/julia.h:2154 [inlined]
true_main at /cache/build/builder-amdci4-1/julialang/julia-release-1-dot-11/src/jlapi.c:900
jl_repl_entrypoint at /cache/build/builder-amdci4-1/julialang/julia-release-1-dot-11/src/jlapi.c:1059
main at /cache/build/builder-amdci4-1/julialang/julia-release-1-dot-11/cli/loader_exe.c:58
unknown function (ip: 0x7f3a5a5dbd8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
unknown function (ip: (nil))
Allocations: 1 (Pool: 1; Big: 0); GC: 0
Manifest-v1.11.toml
# This file is machine-generated - editing it directly is not advised
julia_version = "1.11.0-alpha2"
manifest_format = "2.0"
project_hash = "911edae1ed7fd2de4577c3badb415b11dc83b1e4"
[[deps.Artifacts]]
uuid = "56f22d72-fd6d-98f1-02f0-08ddc0907c33"
version = "1.11.0"
[[deps.BFloat16s]]
deps = ["LinearAlgebra", "Printf", "Random", "Test"]
git-tree-sha1 = "2c7cc21e8678eff479978a0a2ef5ce2f51b63dff"
uuid = "ab4f0b2a-ad5b-11e8-123f-65d77653426b"
version = "0.5.0"
[[deps.Base64]]
uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"
version = "1.11.0"
[[deps.CompilerSupportLibraries_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "e66e0078-7015-5450-92f7-15fbd957f2ae"
version = "1.1.1+0"
[[deps.InteractiveUtils]]
deps = ["Markdown"]
uuid = "b77e0a4c-d291-57a0-90e8-8db25a27a240"
version = "1.11.0"
[[deps.Libdl]]
uuid = "8f399da3-3557-5675-b5ff-fb832c97cbdb"
version = "1.11.0"
[[deps.LinearAlgebra]]
deps = ["Libdl", "OpenBLAS_jll", "libblastrampoline_jll"]
uuid = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
version = "1.11.0"
[[deps.Logging]]
deps = ["StyledStrings"]
uuid = "56ddb016-857b-54e1-b83d-db4d58db5568"
version = "1.11.0"
[[deps.Markdown]]
deps = ["Base64"]
uuid = "d6f4376e-aef5-505a-96c1-9c027394607a"
version = "1.11.0"
[[deps.OpenBLAS_jll]]
deps = ["Artifacts", "CompilerSupportLibraries_jll", "Libdl"]
uuid = "4536629a-c528-5b80-bd46-f80d51c5b363"
version = "0.3.26+2"
[[deps.Printf]]
deps = ["Unicode"]
uuid = "de0858da-6303-5e67-8744-51eddeeeb8d7"
version = "1.11.0"
[[deps.Random]]
deps = ["SHA"]
uuid = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
version = "1.11.0"
[[deps.SHA]]
uuid = "ea8e919c-243c-51af-8825-aaa63cd721ce"
version = "0.7.0"
[[deps.Serialization]]
uuid = "9e88b42a-f829-5b0c-bbe9-9e923198166b"
version = "1.11.0"
[[deps.StyledStrings]]
uuid = "f489334b-da3d-4c2e-b8f0-e476e12c162b"
version = "1.11.0"
[[deps.Test]]
deps = ["InteractiveUtils", "Logging", "Random", "Serialization"]
uuid = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
version = "1.11.0"
[[deps.Unicode]]
uuid = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5"
version = "1.11.0"
[[deps.libblastrampoline_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "8e850b90-86db-534c-a0d3-1478176c7d93"
version = "5.8.0+1"
Nothing breaks on my Intel Core i5-12600K that does not support avx512_bf16.
BFloat16 is for arithmetic operations converted to Float32 and then the result is truncated back down to BFloat16 (but I'm not sure when/how BFloat16 arithmetic is used if natively available). You can check this with
julia> a = one(BFloat16)
BFloat16(1.0)
julia> @code_lowered a+a
CodeInfo(
1 ─ %1 = BFloat16s.Float32(x)
│ %2 = BFloat16s.Float32(y)
│ %3 = %1 + %2
│ %4 = BFloat16s.BFloat16(%3)
└── return %4
)
I also have an Intel i5 on my macbook and with Julia 1.10.2 I cannot reproduce your error, even if I execute this a million times
julia> using BFloat16s
julia> A = ones(BFloat16,10)
julia> for _ in 1:1000000
A + A
end
julia>
Are you sure that .../mwe.jl really only contains these lines of code that you copied in?
Are you sure that
.../mwe.jlreally only contains these lines of code that you copied in?
Yes, I am. I was also testing v0.4.2, hence the v0.5.0/ to separate the environments and the ../ to the common mwe.jl.
BFloat16is for arithmetic operations converted to Float32 [...]
Starting with Julia 1.11 (https://github.com/JuliaLang/julia/commit/54870465b164f630310d91f80d33cbd412bf8fc9) and BFloat16s 0.5 (https://github.com/JuliaMath/BFloat16s.jl/pull/51), native LLVM bfloat is used if available. On the AMD CPU, I see the following.
julia> BFloat16s.llvm_storage
true
julia> BFloat16s.llvm_arithmetic
true
julia> a = one(BFloat16)
BFloat16(1.0)
julia> @code_lowered a+a
CodeInfo(
1 ─ %1 = Base.add_float
│ %2 = (%1)(x, y)
└── return %2
)
But what happens if you look at the LLVM code? Because for me the same conversion happens there (wtih 1.11) but you're hoping it would call fadd bfloat directly?
julia> @code_llvm a+a
; Function Signature: +(Core.BFloat16, Core.BFloat16)
; @ /Users/milan/.julia/packages/BFloat16s/u3WQc/src/bfloat16.jl:225 within `+`
define bfloat @"julia_+_5925"(bfloat %"x::BFloat16", bfloat %"y::BFloat16") #0 {
top:
%0 = fpext bfloat %"x::BFloat16" to float
%1 = fpext bfloat %"y::BFloat16" to float
%2 = fadd float %0, %1
%3 = fptrunc float %2 to bfloat
ret bfloat %3
}
Yes, I was hoping for fadd bfloat, but I see the same IR you posted ... :thinking:
Do I need to compile julia with a custom LLVM that has BF16 enabled ... somehow?
Interestingly, I can't even generate the LLVM IR for A + A from the original MWE:
julia> A = ones(Core.BFloat16, 32);
julia> @code_llvm 2A
ERROR: StackOverflowError:
julia> @code_llvm A + A
ERROR: StackOverflowError:
julia> @code_llvm BFloat16(1) * A
ERROR: StackOverflowError:
Sometimes I even get one core sitting at 100% load just generating the LLVM IR. I am a bit clueless here.
The problem persists on the current nightly, Version 1.12.0-DEV.629 (2024-05-30).
Works for me on AMD EPYC 9654: https://github.com/JuliaLang/julia/issues/54025#issuecomment-2294994413
Please reopen if still broken.