Wrong code for -O1 with loops `fabs` and floating point math
This code:
import std.math : abs;
void main()
{
double[2] v = [-1, 0];
double maxip = 0;
for (int d = 0; d < 2; d++)
{
double[2] w = [0, 1];
w[d] = 1;
double ip = v[0] * w[0] + v[1] * w[1];
if (abs(ip) > abs(maxip))
maxip = ip;
}
assert(maxip < 0);
}
succeeds when compiled without options, which is expected. When compiled with -O1 the assertion trips.
LDC - the LLVM D compiler (1.41.0):
based on DMD v2.111.0 and LLVM 20.1.5
built with LDC - the LLVM D compiler (1.41.0)
Default target: x86_64-pc-windows-msvc
Host CPU: znver5
manually unrolling the loop fixes it.
pragma(LDC_intrinsic, "llvm.fabs.f#")
T llvm_fabs(T)(T val)
if (__traits(isFloating, T));
pragma(LDC_no_moduleinfo);
extern(C) int main(int, char**)
{
double[2] v = [-1.0, 0.0];
double maxip = 0;
int d;
version(none)
for (d = 0; d < 2; d++)
{
double[2] w = [0.0, 1.0];
w[d] = 1.0;
double ip = v[0] * w[0] + v[1] * w[1];
if (llvm_fabs(ip) > llvm_fabs(maxip))
maxip = ip;
}
else
{{
double[2] w = [0.0, 1.0];
w[0] = 1.0;
double ip = v[0] * w[0] + v[1] * w[1];
if (llvm_fabs(ip) > llvm_fabs(maxip))
maxip = ip;
}{
double[2] w = [0.0, 1.0];
w[1] = 1.0;
double ip = v[0] * w[0] + v[1] * w[1];
if (llvm_fabs(ip) > llvm_fabs(maxip))
maxip = ip;
}}
return maxip >= 0;
}
passes, replace all with none and it fails. Clang succeeds with equivalent code (but requires -O2 to optimise to a single return value)
and most strangely --output-ll of the failing code passed to opt -O1 passes! but done directly with LDC fails
OK, given the shell script
../ldc2-1.41.0-osx-arm64/bin/ldc2 -O1 ./gh4997.d --output-ll --print-module-scope --print-before-pass-number=$1 2> test.ll
../llvm/llvm-build/bin/llc ./test.ll && as ./test.s && ld test.o && ./a.out
echo $?
% ./gh4997.sh 47
1
% ./gh4997.sh 46
0
corresponding to pass numbers --print-pass-numbers
Running pass 1 Annotation2MetadataPass on [module]
Running pass 2 ForceFunctionAttrsPass on [module]
Running pass 3 InferFunctionAttrsPass on [module]
Running pass 4 CoroEarlyPass on [module]
Running pass 5 EntryExitInstrumenterPass on main
Running pass 6 LowerExpectIntrinsicPass on main
Running pass 7 SimplifyCFGPass on main
Running pass 8 SROAPass on main
Running pass 9 EarlyCSEPass on main
Running pass 10 OpenMPOptPass on [module]
Running pass 11 IPSCCPPass on [module]
Running pass 12 CalledValuePropagationPass on [module]
Running pass 13 GlobalOptPass on [module]
Running pass 14 PromotePass on main
Running pass 15 InstCombinePass on main
Running pass 16 SimplifyCFGPass on main
Running pass 17 AlwaysInlinerPass on [module]
Running pass 18 RequireAnalysisPass<llvm::GlobalsAA, llvm::Module, llvm::AnalysisManager<Module>> on [module]
Running pass 19 InvalidateAnalysisPass<llvm::AAManager> on main
Running pass 20 RequireAnalysisPass<llvm::ProfileSummaryAnalysis, llvm::Module, llvm::AnalysisManager<Module>> on [module]
Running pass 21 InlinerPass on (main)
Running pass 22 PostOrderFunctionAttrsPass on (main)
Running pass 23 SROAPass on main
Running pass 24 EarlyCSEPass on main
Running pass 25 SimplifyCFGPass on main
Running pass 26 InstCombinePass on main
Running pass 27 LibCallsShrinkWrapPass on main
Running pass 28 SimplifyCFGPass on main
Running pass 29 ReassociatePass on main
Running pass 30 LoopSimplifyPass on main
Running pass 31 LCSSAPass on main
Running pass 32 LoopInstSimplifyPass on loop %forcond in function main
Running pass 33 LoopSimplifyCFGPass on loop %forcond in function main
Running pass 34 LICMPass on loop %forcond in function main
Running pass 35 LoopRotatePass on loop %forcond in function main
Running pass 36 LICMPass on loop %forbody in function main
Running pass 37 SimpleLoopUnswitchPass on loop %forbody in function main
Running pass 38 SimplifyCFGPass on main
Running pass 39 InstCombinePass on main
Running pass 40 LoopSimplifyPass on main
Running pass 41 LCSSAPass on main
Running pass 42 LoopIdiomRecognizePass on loop %forbody in function main
Running pass 43 IndVarSimplifyPass on loop %forbody in function main
Running pass 44 LoopIdiomVectorizePass on loop %forbody in function main
Running pass 45 LoopDeletionPass on loop %forbody in function main
Running pass 46 LoopFullUnrollPass on loop %forbody in function main
Running pass 47 SROAPass on main
Running pass 48 MemCpyOptPass on main
This is looking like it is LLVMs fault for miscompilation, might take a while to get a fix.
https://github.com/llvm/llvm-project/issues/164617
Thanks a lot, Nicholas. I'll be following this eagerly.
Pleasure. How serious is this bug for you? Given that opt succeeds with its pipeline, but ldc doesn't this could in theory be "fixed" with a change in the optimisation pipeline of LDC to bring it in line with opt, but that would require some serious code performance statistics.
I may be able to work around by unrolling the loop that this was reduced from. I will have a look. But there may be many such loops, and discovering all of them is not feasible. I don't think we can release with this bug, or without optimizations; in that regard I currently think it is rather serious.
I don't know how this works, but would it be possible to push this optimization pass to -O3 or -O4 even, so that we could ship with optimizations but not this one? And what is opt? With statistics you mean analysing what performance would be left on the table, and whether that is acceptable?
With statistics you mean analysing what performance would be left on the table, and whether that is acceptable?
No I mean changing optimisation pipeline will affect the performance of all generated code: yours, Weka, symmetry, ...
opt is the tool that is released with llvm to optimise llvm IR, https://llvm.org/docs/CommandGuide/opt.html
but would it be possible to push this optimization pass to -O3 or -O4
I don't know. Scalar replacements of aggregates (SROA, which I'm pretty sure is the optimisation pass that is messing things up here) is a rather basic and commonplace optimisation, so it is surprising to me that it has such a bug in it. I suspect it is some corner case of it.
I think it would probably be easiest to add --output-bc to your build script, and invoke opt on the resulting files and then link them together, and if that doesn't work for whatever reason it would be very useful to confirm. FWIW clang compiles the equivalent code just fine, so it is possible to get that code to work. If you have trouble figuring out how to get that working let me know.
Looks like it has been fixed in LLVM 21 https://github.com/llvm/llvm-project/issues/164617#issuecomment-3432393324
So it should be fixed with the next release of LDC
@kinke would it be possible to release a beta, with current Dub and based on LLVM 21? Perhaps the outdated tag issue needs to be solved first (@dkorpel)?
See #4990 / #4979 LDC does not yet compile with LLVM 21, and that does not count any code issues that may arise. Once it compiles and any issues are worked out then we could certainly do one regardless of the shape of DMD releases are in, but there is still some work to be done. I'll try to get #4979 cleaned up soon.
Much appreciated, good luck.