Nicholas Frechette comments

Results 89 comments of


                                            Nicholas Frechette

feat(decompression): improved ACL asserts when initializing a decompression_context

Yeah, appveyor has been struggling. The builds had been hovering around 50-55mins for some time now, and it looks like a configuration change on their end pushed it over the...

feat(decompression): improved ACL asserts when initializing a decompression_context

@all-contributors add @Yusuf-PG as code

Optimized the performance of float object

Hello and thank you for the contribution! I apologize for the late reply, I am just coming back from a trip abroad. These changes to argument passing are quite subtle...

Optimized the performance of float object

Thank you for the clarification. I will see if I can add a benchmark test based on your sample and see if I can reproduce locally. What kind of processors/android...

Optimized the performance of float object

Yes, those failures are probably due to a known compiler/toolchain issue, see this PR for details: https://github.com/nfrechette/rtm/pull/212 I wouldn't worry about it for now. I'm waiting for github to update...

Optimized the performance of float object

I added a benchmark to profile argument passing for matrix3x3f here: https://github.com/nfrechette/rtm/pull/219 On my M1 laptop, passing by value is a clear winner and the generated assembly by apple clang...

Optimized the performance of float object

The CI also ran my benchmark on x64 SSE2 with clang 14 and we can see there that the calling convention not returning aggregates by register indeed causes performance issues:...

Optimized the performance of float object

Here are some more notes profiling argument passing on my Zen2 desktop. With VS2022 SSE2 and `__vectorcall`, the results are as follow: ``` bm_matrix3x3_arg_passing_current 18.8 ns 18.4 ns 37333333 bm_matrix3x3_arg_passing_ref...

Optimized the performance of float object

Thank you for taking the time to dig deeper :) Writing synthetics benchmarks is as much art as it is science. It is not trivial, especially for simple low level...

Optimized the performance of float object

Out of curiosity, I also added the same benchmark for matrix3x3d to see. ``` bm_matrix3x3d_arg_passing_current 34.6 ns 34.6 ns 20240340 bm_matrix3x3d_arg_passing_ref 25.6 ns 25.5 ns 27587077 bm_matrix3x3d_arg_passing_value 34.3 ns 34.3...