Trixi.jl Add tests using other real types

Add tests using other real types

Open ranocha opened this issue 3 years ago • 16 comments

We should check whether everything works using

Float32
Float64 (current default)
Types from MultiFloats.jl or Quadmath.jl

It would also be nice to see the impact on performance.

Tasks

[ ] Convert 0.5 * or 1/2 * to 0.5f0 *
[ ] Use oftype and convert in initial conditions, source terms etc.
[ ] Implement ln_mean for Float32
[ ] Make mesh types flexible enough
[ ] Add unit tests for numerical fluxes
[ ] Add new tests for full elixirs

May 22 '21 07:05 ranocha

As far as I can tell, MultiFloats.jl does not support transcendental functions, so maybe Quadmath.jl makes more sense for a general-purpose high-precision float. OTOH, MultiFloats.jl seems to be much better supported.

Float32 would definitely be interesting as well!

May 22 '21 09:05 sloede

This will require some changes because of Julia's automatic type promotion for mixed-precision floating point arithmetic. For example, this flux calculation includes multiplication with Float64 numeric literals which will coerce any lower-precision arguments to Float64. For this to work seamlessly, I'd recommend switching floating-point literals to integer literals, Rationals, or wrapping them with a parameterized convert(T, x) whenever possible. At the very least, the returned SVector should be parameterized to match the floating-point type of the arguments u_ll, u_rr to ensure the returned type does not differ from the input type.

For example,

julia> f(x) = x * 1//2
f (generic function with 1 method)

julia> f(2.0)
1.0

julia> f(2.0f0)
1.0f0

julia> f(Float16(2.0))
Float16(1.0)

The machine code for e.g. the Float32 method is identical to what's generated for g(x) = x * 0.5f0:

julia> @code_native f(2.0f0)
        .text
; ┌ @ REPL[3]:1 within `f'
        pushq   %rbp
        movq    %rsp, %rbp
        movabsq $.rodata.cst4, %rax
; │┌ @ promotion.jl:322 within `*' @ float.jl:331
        vmulss  (%rax), %xmm0, %xmm0
; │└
        popq    %rbp
        retq
        nopw    %cs:(%rax,%rax)
; └

julia> @code_native g(2.0f0)
        .text
; ┌ @ REPL[21]:1 within `g'
        pushq   %rbp
        movq    %rsp, %rbp
        movabsq $.rodata.cst4, %rax
; │┌ @ float.jl:331 within `*'
        vmulss  (%rax), %xmm0, %xmm0
; │└
        popq    %rbp
        retq
        nopw    %cs:(%rax,%rax)
; └

Jun 17 '21 19:06 stillyslalom

Thanks for your input, @stillyslalom! Sure, that's part of the process. The first step will be to fix everything such that we can run simulations using plain Float32 everywhere without internal promotion to Float64. Do you happen to know whether there is some tool to check whether Float64s appear in intermediate calculations which should mostly use Float32s (without checking the source or assembly code, but automatic like @ensure_float32_math_only)?

Jun 18 '21 03:06 ranocha

Of course, it would be a great PR to

convert methods to use generic types
check that the performance is not impacted
add tests to verify desired properties

Jun 18 '21 04:06 ranocha

Right now, I hesitate a bit to use rational constants. Unless (something like) https://github.com/JuliaLang/julia/pull/41258 is merged, there can and will be surprising problems. For example, the case of one half reported by @stillyslalom above is fine

julia> f(x) = x * 1//2
f (generic function with 1 method)

julia> @code_native debuginfo=:none syntax=:intel f(1.5)
        .text
        movabs  rax, offset .rodata.cst8
        vmulsd  xmm0, xmm0, qword ptr [rax]
        ret
        nop

but other rational constant will be problematic, e.g.

julia> f(x) = x * 1//3
f (generic function with 1 method)

julia> @code_native debuginfo=:none syntax=:intel f(1.5)
        .text
        sub     rsp, 40
        vmovsd  qword ptr [rsp + 8], xmm0
        movabs  rax, offset divgcd
        lea     rdi, [rsp + 16]
        mov     esi, 1
        mov     edx, 3
        call    rax
        mov     rax, qword ptr [rsp + 24]
        test    rax, rax
        js      L54
        mov     rcx, qword ptr [rsp + 16]
        jmp     L66
L54:
        neg     rax
        js      L91
        xor     ecx, ecx
        sub     rcx, qword ptr [rsp + 16]
L66:
        vcvtsi2sd       xmm0, xmm1, rcx
        vcvtsi2sd       xmm1, xmm1, rax
        vdivsd  xmm0, xmm0, xmm1
        vmulsd  xmm0, xmm0, qword ptr [rsp + 8]
        add     rsp, 40
        ret
L91:
        movabs  rax, offset jl_system_image_data
        mov     qword ptr [rsp + 32], rax
        movabs  rax, offset jl_invoke
        movabs  rdi, offset jl_system_image_data
        lea     rsi, [rsp + 32]
        movabs  rcx, 140365791294448
        mov     edx, 1
        call    rax
        ud2
        nop     word ptr cs:[rax + rax]

I don't really like the (only?) ways to get good code using something like

julia> f(x::T) where {T} = x * (T(1) / T(3))
f (generic function with 1 method)

julia> @code_native debuginfo=:none syntax=:intel f(1.5)
        .text
        movabs  rax, offset .rodata.cst8
        vmulsd  xmm0, xmm0, qword ptr [rax]
        ret
        nop

julia> f(x) = x * Base.unsafe_rational(Int, 1, 3)
f (generic function with 1 method)

julia> @code_native debuginfo=:none syntax=:intel f(1.5)
        .text
        movabs  rax, offset .rodata.cst8
        vmulsd  xmm0, xmm0, qword ptr [rax]
        ret
        nop

since that's considerably less readable, I think.

Jun 19 '21 05:06 ranocha

I agree. I could live with writing fractions using double // syntax, but if it goes beyond that in terms of complexity, we'd have a very convincing use case first. In either case, we'd have to have checks on place that ensure that the other real does not get clobbered accidentally by Float64 by a new function implemented by an unsuspecting user.

Jun 19 '21 07:06 sloede

Now that https://github.com/JuliaLang/julia/pull/41258 is merged, we might be able to use this form of providing constants in Julia v1.8 (or whenever the change is included in a new release).

Edit: I guess it's v1.8 - it's already available in the nightly builds

julia> f(x) = x * 1//2
f (generic function with 1 method)

julia> @code_native debuginfo=:none f(1.5)
        .text
        .file   "f"
        .section        .rodata.cst8,"aM",@progbits,8
        .p2align        3                               # -- Begin function julia_f_54
.LCPI0_0:
        .quad   0x3fe0000000000000              # double 0.5
        .text
        .globl  julia_f_54
        .p2align        4, 0x90
        .type   julia_f_54,@function
julia_f_54:                             # @julia_f_54
        .cfi_startproc
# %bb.0:                                # %top
        movabsq $.LCPI0_0, %rax
        vmulsd  (%rax), %xmm0, %xmm0
        retq
.Lfunc_end0:
        .size   julia_f_54, .Lfunc_end0-julia_f_54
        .cfi_endproc
                                        # -- End function
        .section        ".note.GNU-stack","",@progbits

julia> f(x) = x * 2//3
f (generic function with 1 method)

julia> @code_native debuginfo=:none f(1.5)
        .text
        .file   "f"
        .section        .rodata.cst8,"aM",@progbits,8
        .p2align        3                               # -- Begin function julia_f_109
.LCPI0_0:
        .quad   0x3fe5555555555555              # double 0.66666666666666663
        .text
        .globl  julia_f_109
        .p2align        4, 0x90
        .type   julia_f_109,@function
julia_f_109:                            # @julia_f_109
        .cfi_startproc
# %bb.0:                                # %top
        movabsq $.LCPI0_0, %rax
        vmulsd  (%rax), %xmm0, %xmm0
        retq
.Lfunc_end0:
        .size   julia_f_109, .Lfunc_end0-julia_f_109
        .cfi_endproc
                                        # -- End function
        .section        ".note.GNU-stack","",@progbits

julia> versioninfo()
Julia Version 1.8.0-DEV.1125
Commit fa8be6fda5 (2021-12-12 19:26 UTC)

Jul 27 '21 13:07 ranocha

Our current code doesn't work well with Float32 on GPUs, see https://github.com/huiyuxie/trixi_cuda/issues/3. I tried using rational constants such as (1//2) instead but got problems such as

julia> surface_flux_kernel1 = @cuda launch = false surface_flux_kernel1!(surface_flux_arr, interfaces_u, orientations, equations, surface_flux)
ERROR: InvalidIRError: compiling MethodInstance for surface_flux_kernel1!(::CuDeviceArray{Float32, 5, 1}, ::CuDeviceArray{Float32, 5, 1}, ::CuDeviceVector{Int32, 1}, ::CompressibleEulerEquations3D{Float32}, ::FluxLaxFriedrichs{typeof(max_abs_speed_naive)}) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to __throw_rational_argerror_typemin(T) @ Base rational.jl:20)
Stacktrace:
 [1] checked_den
   @ ./rational.jl:24
 [2] Rational
   @ ./rational.jl:36
 [3] Rational
   @ ./rational.jl:39
 [4] //
   @ ./rational.jl:62
 [5] flux
   @ /tmp/trixi_cuda/trixi/src/equations/compressible_euler_3d.jl:379
 [6] flux_central
   @ /tmp/trixi_cuda/trixi/src/equations/numerical_fluxes.jl:20
...

Thus, it seems to be better to use 0.5f0 instead of 0.5 or (1//2). However, it makes the code also a bit more ugly to read. If we really want to use GPUs and/or nice Float32 operations on CPUs, we have to approach this problem. The only options I see are

Write 0.5f0 etc.
Let a macro do the job and wrap files in @muladd @trixi_special_macro begin ... end blocks - or just @trixi_special_macro begin ... end adding @muladd automatically?

Any thoughts, @trixi-framework/developers?

Jul 11 '23 11:07 ranocha

Result from a local discussion: Maybe a macro? And maybe with the option to be disabled via Preferences.jl?

Jul 11 '23 14:07 ranocha

Xref https://github.com/omlins/ParallelStencil.jl/blob/8530bfd12d99c12da27ffee58d95307a78f59348/src/ParallelKernel/parallel.jl#L250-L291 and https://github.com/JuliaMath/ChangePrecision.jl

Jul 12 '23 09:07 ranocha

After thinking more about it, I tend to prefer an explicit solution where we write 0.5f0 instead of 0.5 etc.

Aug 07 '23 11:08 ranocha

We have discussed this again and agreed to proceed as follows:

Convert 0.5 to 0.5f0 etc.
Add unit tests for fluxes etc.
Add bigger tests - maybe using LIKWID?
Write explicit conversions for magic constants like

https://github.com/trixi-framework/Trixi.jl/blob/b1a84a63298966b67491dacac78e785e0dbfb36c/src/solvers/dgsem_tree/indicators.jl#L116-L118

Apr 16 '24 15:04 ranocha

Trixi.jl Trixi.jl copied to clipboard

Add tests using other real types

Tasks

Trixi.jl
Trixi.jl copied to clipboard