ndarray
ndarray copied to clipboard
Investigate sum_3_azip's performance
benchmark sum_3_azip
seems to perform abysmally compared with the equivalent sum_3_azip_fold
, investigate why, and why the former doesn't autovectorize like the latter.
The difference is minimal with current master and nightly:
test sum_3_azip ... bench: 825 ns/iter (+/- 130)
test sum_3_azip_fold ... bench: 820 ns/iter (+/- 105)