ldc Wrong code for -O1 with loops `fabs` and floating point math

This code:

import std.math : abs;

void main()
{
    double[2] v = [-1, 0];
    double maxip = 0;
    for (int d = 0; d < 2; d++)
    {
        double[2] w = [0, 1];
        w[d] = 1;
        double ip = v[0] * w[0] + v[1] * w[1];
        if (abs(ip) > abs(maxip))
            maxip = ip;
    }
    assert(maxip < 0);
}

succeeds when compiled without options, which is expected. When compiled with -O1 the assertion trips.

LDC - the LLVM D compiler (1.41.0):
  based on DMD v2.111.0 and LLVM 20.1.5
  built with LDC - the LLVM D compiler (1.41.0)
  Default target: x86_64-pc-windows-msvc
  Host CPU: znver5

Oct 22 '25 08:10 veelo

manually unrolling the loop fixes it.

pragma(LDC_intrinsic, "llvm.fabs.f#")
    T llvm_fabs(T)(T val)
        if (__traits(isFloating, T));
pragma(LDC_no_moduleinfo);

extern(C) int main(int, char**)
{
    double[2] v = [-1.0, 0.0];
    double maxip = 0;
    int d;
    version(none)
    for (d = 0; d < 2; d++)
    {
        double[2] w = [0.0, 1.0];
        w[d] = 1.0;
        double ip = v[0] * w[0] + v[1] * w[1];
        if (llvm_fabs(ip) > llvm_fabs(maxip))
            maxip = ip;
    }
    else
    {{
        double[2] w = [0.0, 1.0];
        w[0] = 1.0;
        double ip = v[0] * w[0] + v[1] * w[1];
        if (llvm_fabs(ip) > llvm_fabs(maxip))
            maxip = ip;
                
    }{
        double[2] w = [0.0, 1.0];
        w[1] = 1.0;
        double ip = v[0] * w[0] + v[1] * w[1];
        if (llvm_fabs(ip) > llvm_fabs(maxip))
            maxip = ip;
    }}
    return maxip >= 0;
}

passes, replace all with none and it fails. Clang succeeds with equivalent code (but requires -O2 to optimise to a single return value)

Oct 22 '25 09:10 thewilsonator

and most strangely --output-ll of the failing code passed to opt -O1 passes! but done directly with LDC fails

Oct 22 '25 09:10 thewilsonator

OK, given the shell script

../ldc2-1.41.0-osx-arm64/bin/ldc2 -O1 ./gh4997.d --output-ll --print-module-scope --print-before-pass-number=$1 2> test.ll
 ../llvm/llvm-build/bin/llc ./test.ll && as ./test.s && ld test.o && ./a.out
 echo $?

% ./gh4997.sh 47
1
% ./gh4997.sh 46
0

corresponding to pass numbers --print-pass-numbers

Running pass 1 Annotation2MetadataPass on [module]
 Running pass 2 ForceFunctionAttrsPass on [module]
 Running pass 3 InferFunctionAttrsPass on [module]
 Running pass 4 CoroEarlyPass on [module]
 Running pass 5 EntryExitInstrumenterPass on main
 Running pass 6 LowerExpectIntrinsicPass on main
 Running pass 7 SimplifyCFGPass on main
 Running pass 8 SROAPass on main
 Running pass 9 EarlyCSEPass on main
 Running pass 10 OpenMPOptPass on [module]
 Running pass 11 IPSCCPPass on [module]
 Running pass 12 CalledValuePropagationPass on [module]
 Running pass 13 GlobalOptPass on [module]
 Running pass 14 PromotePass on main
 Running pass 15 InstCombinePass on main
 Running pass 16 SimplifyCFGPass on main
 Running pass 17 AlwaysInlinerPass on [module]
 Running pass 18 RequireAnalysisPass<llvm::GlobalsAA, llvm::Module, llvm::AnalysisManager<Module>> on [module]
 Running pass 19 InvalidateAnalysisPass<llvm::AAManager> on main
 Running pass 20 RequireAnalysisPass<llvm::ProfileSummaryAnalysis, llvm::Module, llvm::AnalysisManager<Module>> on [module]
 Running pass 21 InlinerPass on (main)
 Running pass 22 PostOrderFunctionAttrsPass on (main)
 Running pass 23 SROAPass on main
 Running pass 24 EarlyCSEPass on main
 Running pass 25 SimplifyCFGPass on main
 Running pass 26 InstCombinePass on main
 Running pass 27 LibCallsShrinkWrapPass on main
 Running pass 28 SimplifyCFGPass on main
 Running pass 29 ReassociatePass on main
 Running pass 30 LoopSimplifyPass on main
 Running pass 31 LCSSAPass on main
 Running pass 32 LoopInstSimplifyPass on loop %forcond in function main
 Running pass 33 LoopSimplifyCFGPass on loop %forcond in function main
 Running pass 34 LICMPass on loop %forcond in function main
 Running pass 35 LoopRotatePass on loop %forcond in function main
 Running pass 36 LICMPass on loop %forbody in function main
 Running pass 37 SimpleLoopUnswitchPass on loop %forbody in function main
 Running pass 38 SimplifyCFGPass on main
 Running pass 39 InstCombinePass on main
 Running pass 40 LoopSimplifyPass on main
 Running pass 41 LCSSAPass on main
 Running pass 42 LoopIdiomRecognizePass on loop %forbody in function main
 Running pass 43 IndVarSimplifyPass on loop %forbody in function main
 Running pass 44 LoopIdiomVectorizePass on loop %forbody in function main
 Running pass 45 LoopDeletionPass on loop %forbody in function main
 Running pass 46 LoopFullUnrollPass on loop %forbody in function main
 Running pass 47 SROAPass on main
 Running pass 48 MemCpyOptPass on main

Oct 22 '25 10:10 thewilsonator

This is looking like it is LLVMs fault for miscompilation, might take a while to get a fix.

Oct 22 '25 10:10 thewilsonator

https://github.com/llvm/llvm-project/issues/164617

Oct 22 '25 12:10 thewilsonator

Thanks a lot, Nicholas. I'll be following this eagerly.

Oct 22 '25 12:10 veelo

Pleasure. How serious is this bug for you? Given that opt succeeds with its pipeline, but ldc doesn't this could in theory be "fixed" with a change in the optimisation pipeline of LDC to bring it in line with opt, but that would require some serious code performance statistics.

Oct 22 '25 12:10 thewilsonator

I may be able to work around by unrolling the loop that this was reduced from. I will have a look. But there may be many such loops, and discovering all of them is not feasible. I don't think we can release with this bug, or without optimizations; in that regard I currently think it is rather serious.

I don't know how this works, but would it be possible to push this optimization pass to -O3 or -O4 even, so that we could ship with optimizations but not this one? And what is opt? With statistics you mean analysing what performance would be left on the table, and whether that is acceptable?

Oct 22 '25 13:10 veelo

With statistics you mean analysing what performance would be left on the table, and whether that is acceptable?

No I mean changing optimisation pipeline will affect the performance of all generated code: yours, Weka, symmetry, ...

opt is the tool that is released with llvm to optimise llvm IR, https://llvm.org/docs/CommandGuide/opt.html

but would it be possible to push this optimization pass to -O3 or -O4

I don't know. Scalar replacements of aggregates (SROA, which I'm pretty sure is the optimisation pass that is messing things up here) is a rather basic and commonplace optimisation, so it is surprising to me that it has such a bug in it. I suspect it is some corner case of it. I think it would probably be easiest to add --output-bc to your build script, and invoke opt on the resulting files and then link them together, and if that doesn't work for whatever reason it would be very useful to confirm. FWIW clang compiles the equivalent code just fine, so it is possible to get that code to work. If you have trouble figuring out how to get that working let me know.

Oct 22 '25 13:10 thewilsonator

Looks like it has been fixed in LLVM 21 https://github.com/llvm/llvm-project/issues/164617#issuecomment-3432393324

So it should be fixed with the next release of LDC

Oct 22 '25 13:10 thewilsonator

@kinke would it be possible to release a beta, with current Dub and based on LLVM 21? Perhaps the outdated tag issue needs to be solved first (@dkorpel)?

Oct 22 '25 14:10 veelo

See #4990 / #4979 LDC does not yet compile with LLVM 21, and that does not count any code issues that may arise. Once it compiles and any issues are worked out then we could certainly do one regardless of the shape of DMD releases are in, but there is still some work to be done. I'll try to get #4979 cleaned up soon.

Oct 22 '25 14:10 thewilsonator

Much appreciated, good luck.

Oct 22 '25 14:10 veelo