ldc
ldc copied to clipboard
perf: optimize indexing in inner loops
Benchmark: https://github.com/attractivechaos/plb2/pull/6
According to note [1], clang is able to optimize them, shouldn't LDC be able to as well?
[1] - https://github.com/attractivechaos/plb2?tab=readme-ov-file#optimizing-inner-loops
Please add more details here. I have to search different pages and source code to figure out exactly what you mean and what difference you observe between Clang and LDC.
First thing that comes to mind is that for the D source, the optimizer is not able to prove that writing to c[i][j]
does not change contents a
or b
, i.e. that c
does not alias a
or b
; without that proof, it is invalid to hoist a[i][k]
or b[k]
from the loop. In the C source, the optimizer can see calls to malloc
and calloc
which guarantee that c
can never alias a
or b
. In D, the call to _d_newarraymiTX
(auto c = new double[][](m, p);
) is IIRC not recognized (nor marked) as a memory allocation function, thus the optimizer does not get the information to prove that things don't alias.
My first attempt to improve the situation would be to replace the allocation with something that the optimizer does recognize.
Please add more details here. I have to search different pages and source code to figure out exactly what you mean and what difference you observe between Clang and LDC.
Sorry, i thought using the diff would be enough, because my lack of expertise would have made me choose incorrect wording
https://github.com/attractivechaos/plb2/pull/6/files
Particularly the 2nd commit makes things even faster:
https://github.com/attractivechaos/plb2/pull/6/commits/7dc32f643ed786996c65dd0954bffebd6510e98b
Result in the comment in the 2nd paragraph: https://github.com/attractivechaos/plb2/pull/6#issue-2063736667
First thing that comes to mind is that for the D source, the optimizer is not able to prove that writing to
c[i][j]
does not change contentsa
orb
, i.e. thatc
does not aliasa
orb
; without that proof, it is invalid to hoista[i][k]
orb[k]
from the loop. In the C source, the optimizer can see calls tomalloc
andcalloc
which guarantee thatc
can never aliasa
orb
. In D, the call to_d_newarraymiTX
(auto c = new double[][](m, p);
) is IIRC not recognized (nor marked) as a memory allocation function, thus the optimizer does not get the information to prove that things don't alias. My first attempt to improve the situation would be to replace the allocation with something that the optimizer does recognize.
Thanks for the clear explanation
PR got merged, result now is equal to what C with clang produce https://github.com/attractivechaos/plb2?tab=readme-ov-file#appendix-timing-on-apple-m1-macbook-pro