Nicholas Frechette
Nicholas Frechette
Yeah, appveyor has been struggling. The builds had been hovering around 50-55mins for some time now, and it looks like a configuration change on their end pushed it over the...
@all-contributors add @Yusuf-PG as code
Hello and thank you for the contribution! I apologize for the late reply, I am just coming back from a trip abroad. These changes to argument passing are quite subtle...
Thank you for the clarification. I will see if I can add a benchmark test based on your sample and see if I can reproduce locally. What kind of processors/android...
Yes, those failures are probably due to a known compiler/toolchain issue, see this PR for details: https://github.com/nfrechette/rtm/pull/212 I wouldn't worry about it for now. I'm waiting for github to update...
I added a benchmark to profile argument passing for matrix3x3f here: https://github.com/nfrechette/rtm/pull/219 On my M1 laptop, passing by value is a clear winner and the generated assembly by apple clang...
The CI also ran my benchmark on x64 SSE2 with clang 14 and we can see there that the calling convention not returning aggregates by register indeed causes performance issues:...
Here are some more notes profiling argument passing on my Zen2 desktop. With VS2022 SSE2 and `__vectorcall`, the results are as follow: ``` bm_matrix3x3_arg_passing_current 18.8 ns 18.4 ns 37333333 bm_matrix3x3_arg_passing_ref...
Thank you for taking the time to dig deeper :) Writing synthetics benchmarks is as much art as it is science. It is not trivial, especially for simple low level...
Out of curiosity, I also added the same benchmark for matrix3x3d to see. ``` bm_matrix3x3d_arg_passing_current 34.6 ns 34.6 ns 20240340 bm_matrix3x3d_arg_passing_ref 25.6 ns 25.5 ns 27587077 bm_matrix3x3d_arg_passing_value 34.3 ns 34.3...