Trixi.jl
Trixi.jl copied to clipboard
Add tests using other real types
We should check whether everything works using
-
Float32
-
Float64
(current default) - Types from MultiFloats.jl or Quadmath.jl
It would also be nice to see the impact on performance.
Tasks
- [ ] Convert
0.5 *
or1/2 *
to0.5f0 *
- [ ] Use
oftype
andconvert
in initial conditions, source terms etc. - [ ] Implement
ln_mean
forFloat32
- [ ] Make mesh types flexible enough
- [ ] Add unit tests for numerical fluxes
- [ ] Add new tests for full elixirs
As far as I can tell, MultiFloats.jl
does not support transcendental functions, so maybe Quadmath.jl
makes more sense for a general-purpose high-precision float. OTOH, MultiFloats.jl
seems to be much better supported.
Float32
would definitely be interesting as well!
This will require some changes because of Julia's automatic type promotion for mixed-precision floating point arithmetic. For example, this flux calculation includes multiplication with Float64
numeric literals which will coerce any lower-precision arguments to Float64
. For this to work seamlessly, I'd recommend switching floating-point literals to integer literals, Rational
s, or wrapping them with a parameterized convert(T, x)
whenever possible. At the very least, the returned SVector
should be parameterized to match the floating-point type of the arguments u_ll
, u_rr
to ensure the returned type does not differ from the input type.
For example,
julia> f(x) = x * 1//2
f (generic function with 1 method)
julia> f(2.0)
1.0
julia> f(2.0f0)
1.0f0
julia> f(Float16(2.0))
Float16(1.0)
The machine code for e.g. the Float32
method is identical to what's generated for g(x) = x * 0.5f0
:
julia> @code_native f(2.0f0)
.text
; ┌ @ REPL[3]:1 within `f'
pushq %rbp
movq %rsp, %rbp
movabsq $.rodata.cst4, %rax
; │┌ @ promotion.jl:322 within `*' @ float.jl:331
vmulss (%rax), %xmm0, %xmm0
; │└
popq %rbp
retq
nopw %cs:(%rax,%rax)
; └
julia> @code_native g(2.0f0)
.text
; ┌ @ REPL[21]:1 within `g'
pushq %rbp
movq %rsp, %rbp
movabsq $.rodata.cst4, %rax
; │┌ @ float.jl:331 within `*'
vmulss (%rax), %xmm0, %xmm0
; │└
popq %rbp
retq
nopw %cs:(%rax,%rax)
; └
Thanks for your input, @stillyslalom! Sure, that's part of the process. The first step will be to fix everything such that we can run simulations using plain Float32
everywhere without internal promotion to Float64
. Do you happen to know whether there is some tool to check whether Float64
s appear in intermediate calculations which should mostly use Float32
s (without checking the source or assembly code, but automatic like @ensure_float32_math_only
)?
Of course, it would be a great PR to
- convert methods to use generic types
- check that the performance is not impacted
- add tests to verify desired properties
Right now, I hesitate a bit to use rational constants. Unless (something like) https://github.com/JuliaLang/julia/pull/41258 is merged, there can and will be surprising problems. For example, the case of one half reported by @stillyslalom above is fine
julia> f(x) = x * 1//2
f (generic function with 1 method)
julia> @code_native debuginfo=:none syntax=:intel f(1.5)
.text
movabs rax, offset .rodata.cst8
vmulsd xmm0, xmm0, qword ptr [rax]
ret
nop
but other rational constant will be problematic, e.g.
julia> f(x) = x * 1//3
f (generic function with 1 method)
julia> @code_native debuginfo=:none syntax=:intel f(1.5)
.text
sub rsp, 40
vmovsd qword ptr [rsp + 8], xmm0
movabs rax, offset divgcd
lea rdi, [rsp + 16]
mov esi, 1
mov edx, 3
call rax
mov rax, qword ptr [rsp + 24]
test rax, rax
js L54
mov rcx, qword ptr [rsp + 16]
jmp L66
L54:
neg rax
js L91
xor ecx, ecx
sub rcx, qword ptr [rsp + 16]
L66:
vcvtsi2sd xmm0, xmm1, rcx
vcvtsi2sd xmm1, xmm1, rax
vdivsd xmm0, xmm0, xmm1
vmulsd xmm0, xmm0, qword ptr [rsp + 8]
add rsp, 40
ret
L91:
movabs rax, offset jl_system_image_data
mov qword ptr [rsp + 32], rax
movabs rax, offset jl_invoke
movabs rdi, offset jl_system_image_data
lea rsi, [rsp + 32]
movabs rcx, 140365791294448
mov edx, 1
call rax
ud2
nop word ptr cs:[rax + rax]
I don't really like the (only?) ways to get good code using something like
julia> f(x::T) where {T} = x * (T(1) / T(3))
f (generic function with 1 method)
julia> @code_native debuginfo=:none syntax=:intel f(1.5)
.text
movabs rax, offset .rodata.cst8
vmulsd xmm0, xmm0, qword ptr [rax]
ret
nop
julia> f(x) = x * Base.unsafe_rational(Int, 1, 3)
f (generic function with 1 method)
julia> @code_native debuginfo=:none syntax=:intel f(1.5)
.text
movabs rax, offset .rodata.cst8
vmulsd xmm0, xmm0, qword ptr [rax]
ret
nop
since that's considerably less readable, I think.
I agree. I could live with writing fractions using double //
syntax, but if it goes beyond that in terms of complexity, we'd have a very convincing use case first. In either case, we'd have to have checks on place that ensure that the other real does not get clobbered accidentally by Float64 by a new function implemented by an unsuspecting user.
Now that https://github.com/JuliaLang/julia/pull/41258 is merged, we might be able to use this form of providing constants in Julia v1.8 (or whenever the change is included in a new release).
Edit: I guess it's v1.8 - it's already available in the nightly builds
julia> f(x) = x * 1//2
f (generic function with 1 method)
julia> @code_native debuginfo=:none f(1.5)
.text
.file "f"
.section .rodata.cst8,"aM",@progbits,8
.p2align 3 # -- Begin function julia_f_54
.LCPI0_0:
.quad 0x3fe0000000000000 # double 0.5
.text
.globl julia_f_54
.p2align 4, 0x90
.type julia_f_54,@function
julia_f_54: # @julia_f_54
.cfi_startproc
# %bb.0: # %top
movabsq $.LCPI0_0, %rax
vmulsd (%rax), %xmm0, %xmm0
retq
.Lfunc_end0:
.size julia_f_54, .Lfunc_end0-julia_f_54
.cfi_endproc
# -- End function
.section ".note.GNU-stack","",@progbits
julia> f(x) = x * 2//3
f (generic function with 1 method)
julia> @code_native debuginfo=:none f(1.5)
.text
.file "f"
.section .rodata.cst8,"aM",@progbits,8
.p2align 3 # -- Begin function julia_f_109
.LCPI0_0:
.quad 0x3fe5555555555555 # double 0.66666666666666663
.text
.globl julia_f_109
.p2align 4, 0x90
.type julia_f_109,@function
julia_f_109: # @julia_f_109
.cfi_startproc
# %bb.0: # %top
movabsq $.LCPI0_0, %rax
vmulsd (%rax), %xmm0, %xmm0
retq
.Lfunc_end0:
.size julia_f_109, .Lfunc_end0-julia_f_109
.cfi_endproc
# -- End function
.section ".note.GNU-stack","",@progbits
julia> versioninfo()
Julia Version 1.8.0-DEV.1125
Commit fa8be6fda5 (2021-12-12 19:26 UTC)
Our current code doesn't work well with Float32
on GPUs, see https://github.com/huiyuxie/trixi_cuda/issues/3. I tried using rational constants such as (1//2)
instead but got problems such as
julia> surface_flux_kernel1 = @cuda launch = false surface_flux_kernel1!(surface_flux_arr, interfaces_u, orientations, equations, surface_flux)
ERROR: InvalidIRError: compiling MethodInstance for surface_flux_kernel1!(::CuDeviceArray{Float32, 5, 1}, ::CuDeviceArray{Float32, 5, 1}, ::CuDeviceVector{Int32, 1}, ::CompressibleEulerEquations3D{Float32}, ::FluxLaxFriedrichs{typeof(max_abs_speed_naive)}) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to __throw_rational_argerror_typemin(T) @ Base rational.jl:20)
Stacktrace:
[1] checked_den
@ ./rational.jl:24
[2] Rational
@ ./rational.jl:36
[3] Rational
@ ./rational.jl:39
[4] //
@ ./rational.jl:62
[5] flux
@ /tmp/trixi_cuda/trixi/src/equations/compressible_euler_3d.jl:379
[6] flux_central
@ /tmp/trixi_cuda/trixi/src/equations/numerical_fluxes.jl:20
...
Thus, it seems to be better to use 0.5f0
instead of 0.5
or (1//2)
. However, it makes the code also a bit more ugly to read. If we really want to use GPUs and/or nice Float32
operations on CPUs, we have to approach this problem. The only options I see are
- Write
0.5f0
etc. - Let a macro do the job and wrap files in
@muladd @trixi_special_macro begin ... end
blocks - or just@trixi_special_macro begin ... end
adding@muladd
automatically?
Any thoughts, @trixi-framework/developers?
Result from a local discussion: Maybe a macro? And maybe with the option to be disabled via Preferences.jl?
Xref https://github.com/omlins/ParallelStencil.jl/blob/8530bfd12d99c12da27ffee58d95307a78f59348/src/ParallelKernel/parallel.jl#L250-L291 and https://github.com/JuliaMath/ChangePrecision.jl
After thinking more about it, I tend to prefer an explicit solution where we write 0.5f0
instead of 0.5
etc.
We have discussed this again and agreed to proceed as follows:
- Convert
0.5
to0.5f0
etc. - Add unit tests for fluxes etc.
- Add bigger tests - maybe using LIKWID?
- Write explicit conversions for magic constants like
https://github.com/trixi-framework/Trixi.jl/blob/b1a84a63298966b67491dacac78e785e0dbfb36c/src/solvers/dgsem_tree/indicators.jl#L116-L118