BFloat16s.jl Segfault using Julia 1.11-alpha2 on AMD EPYC 9554

Running

using BFloat16s # v0.5

A = ones(BFloat16, 10)
A + A

sometimes leads to a segfault, sometimes a stack overflow, and sometimes one CPU sits at 100% until ^Ced. Nothing breaks on my Intel Core i5-12600K that does not support avx512_bf16.

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.11.0-alpha2 (2024-03-18)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> include("../mwe.jl")
ERROR: LoadError: StackOverflowError:
in expression starting at /home/jschulze/tmp/julia-bfloat16/mwe.jl:4

julia> include("../mwe.jl")
ERROR: LoadError: StackOverflowError:
in expression starting at /home/jschulze/tmp/julia-bfloat16/mwe.jl:4

julia> include("../mwe.jl")
ERROR: LoadError: StackOverflowError:
in expression starting at /home/jschulze/tmp/julia-bfloat16/mwe.jl:4

julia> 
jschulze@hostname:~/tmp/julia-bfloat16/v0.5.0$ julia +1.11 ../mwe.jl 
ERROR: LoadError: StackOverflowError:
in expression starting at /home/jschulze/tmp/julia-bfloat16/mwe.jl:4
jschulze@hostname:~/tmp/julia-bfloat16/v0.5.0$ julia +1.11 ../mwe.jl 
Segmentation fault (core dumped)
jschulze@hostname:~/tmp/julia-bfloat16/v0.5.0$ julia +1.11 ../mwe.jl 
^C
[207628] signal 2: Interrupt
in expression starting at none:0
_ZN4llvm8ExpectedINS_8ArrayRefINS_6object12Elf_Sym_ImplINS2_7ELFTypeILNS_7support10endiannessE1ELb1EEEEEEEED2Ev at /home/jschulze/.julia/juliaup/julia-1.11.0-alpha2+0.x64.linux.gnu/bin/../lib/julia/libLLVM-16jl.so (unknown line)
_ZNK4llvm6object13ELFObjectFileINS0_7ELFTypeILNS_7support10endiannessE1ELb1EEEE14getSymbolFlagsENS0_11DataRefImplE at /home/jschulze/.julia/juliaup/julia-1.11.0-alpha2+0.x64.linux.gnu/bin/../lib/julia/libLLVM-16jl.so (unknown line)
_ZNK4llvm6object10ObjectFile14getSymbolValueENS0_11DataRefImplE at /home/jschulze/.julia/juliaup/julia-1.11.0-alpha2+0.x64.linux.gnu/bin/../lib/julia/libLLVM-16jl.so (unknown line)
_ZNK4llvm6object13ELFObjectFileINS0_7ELFTypeILNS_7support10endiannessE1ELb1EEEE16getSymbolAddressENS0_11DataRefImplE at /home/jschulze/.julia/juliaup/julia-1.11.0-alpha2+0.x64.linux.gnu/bin/../lib/julia/libLLVM-16jl.so (unknown line)
getAddress at /cache/build/builder-amdci4-1/julialang/julia-release-1-dot-11/usr/include/llvm/Object/ObjectFile.h:408 [inlined]
get_function_name_and_base at /cache/build/builder-amdci4-1/julialang/julia-release-1-dot-11/src/debuginfo.cpp:746 [inlined]
jl_dylib_DI_for_fptr at /cache/build/builder-amdci4-1/julialang/julia-release-1-dot-11/src/debuginfo.cpp:1142
jl_getDylibFunctionInfo at /cache/build/builder-amdci4-1/julialang/julia-release-1-dot-11/src/debuginfo.cpp:1174 [inlined]
jl_getFunctionInfo_impl at /cache/build/builder-amdci4-1/julialang/julia-release-1-dot-11/src/debuginfo.cpp:1247
ijl_lookup_code_address at /cache/build/builder-amdci4-1/julialang/julia-release-1-dot-11/src/stackwalk.c:589
lookup at ./stacktraces.jl:108
stacktrace at ./stacktraces.jl:164
stacktrace at ./stacktraces.jl:162 [inlined]
scrub_repl_backtrace at ./client.jl:96
jfptr_scrub_repl_backtrace_70894.1 at /home/jschulze/.julia/juliaup/julia-1.11.0-alpha2+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
scrub_repl_backtrace at ./client.jl:103
exec_options at ./client.jl:321
_start at ./client.jl:526
jfptr__start_71122.1 at /home/jschulze/.julia/juliaup/julia-1.11.0-alpha2+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
jl_apply at /cache/build/builder-amdci4-1/julialang/julia-release-1-dot-11/src/julia.h:2154 [inlined]
true_main at /cache/build/builder-amdci4-1/julialang/julia-release-1-dot-11/src/jlapi.c:900
jl_repl_entrypoint at /cache/build/builder-amdci4-1/julialang/julia-release-1-dot-11/src/jlapi.c:1059
main at /cache/build/builder-amdci4-1/julialang/julia-release-1-dot-11/cli/loader_exe.c:58
unknown function (ip: 0x7f3a5a5dbd8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
unknown function (ip: (nil))
Allocations: 1 (Pool: 1; Big: 0); GC: 0

Manifest-v1.11.toml

# This file is machine-generated - editing it directly is not advised

julia_version = "1.11.0-alpha2"
manifest_format = "2.0"
project_hash = "911edae1ed7fd2de4577c3badb415b11dc83b1e4"

[[deps.Artifacts]]
uuid = "56f22d72-fd6d-98f1-02f0-08ddc0907c33"
version = "1.11.0"

[[deps.BFloat16s]]
deps = ["LinearAlgebra", "Printf", "Random", "Test"]
git-tree-sha1 = "2c7cc21e8678eff479978a0a2ef5ce2f51b63dff"
uuid = "ab4f0b2a-ad5b-11e8-123f-65d77653426b"
version = "0.5.0"

[[deps.Base64]]
uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"
version = "1.11.0"

[[deps.CompilerSupportLibraries_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "e66e0078-7015-5450-92f7-15fbd957f2ae"
version = "1.1.1+0"

[[deps.InteractiveUtils]]
deps = ["Markdown"]
uuid = "b77e0a4c-d291-57a0-90e8-8db25a27a240"
version = "1.11.0"

[[deps.Libdl]]
uuid = "8f399da3-3557-5675-b5ff-fb832c97cbdb"
version = "1.11.0"

[[deps.LinearAlgebra]]
deps = ["Libdl", "OpenBLAS_jll", "libblastrampoline_jll"]
uuid = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
version = "1.11.0"

[[deps.Logging]]
deps = ["StyledStrings"]
uuid = "56ddb016-857b-54e1-b83d-db4d58db5568"
version = "1.11.0"

[[deps.Markdown]]
deps = ["Base64"]
uuid = "d6f4376e-aef5-505a-96c1-9c027394607a"
version = "1.11.0"

[[deps.OpenBLAS_jll]]
deps = ["Artifacts", "CompilerSupportLibraries_jll", "Libdl"]
uuid = "4536629a-c528-5b80-bd46-f80d51c5b363"
version = "0.3.26+2"

[[deps.Printf]]
deps = ["Unicode"]
uuid = "de0858da-6303-5e67-8744-51eddeeeb8d7"
version = "1.11.0"

[[deps.Random]]
deps = ["SHA"]
uuid = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
version = "1.11.0"

[[deps.SHA]]
uuid = "ea8e919c-243c-51af-8825-aaa63cd721ce"
version = "0.7.0"

[[deps.Serialization]]
uuid = "9e88b42a-f829-5b0c-bbe9-9e923198166b"
version = "1.11.0"

[[deps.StyledStrings]]
uuid = "f489334b-da3d-4c2e-b8f0-e476e12c162b"
version = "1.11.0"

[[deps.Test]]
deps = ["InteractiveUtils", "Logging", "Random", "Serialization"]
uuid = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
version = "1.11.0"

[[deps.Unicode]]
uuid = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5"
version = "1.11.0"

[[deps.libblastrampoline_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "8e850b90-86db-534c-a0d3-1478176c7d93"
version = "5.8.0+1"

Mar 28 '24 18:03 jonas-schulze

Nothing breaks on my Intel Core i5-12600K that does not support avx512_bf16.

BFloat16 is for arithmetic operations converted to Float32 and then the result is truncated back down to BFloat16 (but I'm not sure when/how BFloat16 arithmetic is used if natively available). You can check this with

julia> a = one(BFloat16)
BFloat16(1.0)

julia> @code_lowered a+a
CodeInfo(
1 ─ %1 = BFloat16s.Float32(x)
│   %2 = BFloat16s.Float32(y)
│   %3 = %1 + %2
│   %4 = BFloat16s.BFloat16(%3)
└──      return %4
)

I also have an Intel i5 on my macbook and with Julia 1.10.2 I cannot reproduce your error, even if I execute this a million times

julia> using BFloat16s
julia> A = ones(BFloat16,10)
julia> for _ in 1:1000000
           A + A
       end

julia>

Are you sure that .../mwe.jl really only contains these lines of code that you copied in?

Mar 28 '24 18:03 milankl

Are you sure that .../mwe.jl really only contains these lines of code that you copied in?

Yes, I am. I was also testing v0.4.2, hence the v0.5.0/ to separate the environments and the ../ to the common mwe.jl.

BFloat16 is for arithmetic operations converted to Float32 [...]

Starting with Julia 1.11 (https://github.com/JuliaLang/julia/commit/54870465b164f630310d91f80d33cbd412bf8fc9) and BFloat16s 0.5 (https://github.com/JuliaMath/BFloat16s.jl/pull/51), native LLVM bfloat is used if available. On the AMD CPU, I see the following.

julia> BFloat16s.llvm_storage
true

julia> BFloat16s.llvm_arithmetic
true

julia> a = one(BFloat16)
BFloat16(1.0)

julia> @code_lowered a+a
CodeInfo(
1 ─ %1 = Base.add_float
│   %2 = (%1)(x, y)
└──      return %2
)

Mar 28 '24 18:03 jonas-schulze

But what happens if you look at the LLVM code? Because for me the same conversion happens there (wtih 1.11) but you're hoping it would call fadd bfloat directly?

julia> @code_llvm a+a
; Function Signature: +(Core.BFloat16, Core.BFloat16)
;  @ /Users/milan/.julia/packages/BFloat16s/u3WQc/src/bfloat16.jl:225 within `+`
define bfloat @"julia_+_5925"(bfloat %"x::BFloat16", bfloat %"y::BFloat16") #0 {
top:
  %0 = fpext bfloat %"x::BFloat16" to float
  %1 = fpext bfloat %"y::BFloat16" to float
  %2 = fadd float %0, %1
  %3 = fptrunc float %2 to bfloat
  ret bfloat %3
}

Mar 28 '24 18:03 milankl

Yes, I was hoping for fadd bfloat, but I see the same IR you posted ... :thinking: Do I need to compile julia with a custom LLVM that has BF16 enabled ... somehow?

Interestingly, I can't even generate the LLVM IR for A + A from the original MWE:

julia> A = ones(Core.BFloat16, 32);

julia> @code_llvm 2A
ERROR: StackOverflowError:

julia> @code_llvm A + A
ERROR: StackOverflowError:

julia> @code_llvm BFloat16(1) * A
ERROR: StackOverflowError:

Sometimes I even get one core sitting at 100% load just generating the LLVM IR. I am a bit clueless here.

Apr 02 '24 09:04 jonas-schulze

The problem persists on the current nightly, Version 1.12.0-DEV.629 (2024-05-30).

May 30 '24 12:05 jonas-schulze

Works for me on AMD EPYC 9654: https://github.com/JuliaLang/julia/issues/54025#issuecomment-2294994413

Aug 17 '24 21:08 giordano

Please reopen if still broken.

Sep 17 '24 18:09 ViralBShah