sarah quiñones
sarah quiñones
one potential solution would be to "align" the end iterators (and maybe cache the result?) when `end` is computed, if the ranges are all sized. if the ranges aren't sized,...
well, these don't look like the most promising results ^^'
are there tests that do?
rustc-perf seems to take forever on my machine and i can't display the results after it's finished. so that doesn't seem like a good option for me :/
thanks for the tips! i managed to get it working thanks to your help. it seems that the biggest culprit was inlining the ops::function wrappers. but even without it i...
it's not just sizes smaller than the smallest native size. for example, the above code generates the correct instructions if we use `float64`, but not with `float64` (loads/stores 4 `double`s)...
i like the idea of leaving it as an experimental feature for now, since faer is relatively new as a library and the api might still change in the future....
matmul benchmark results on my machine after enabling `gemm`: ``` mat100_mul_mat100 time: [39.299 µs 39.346 µs 39.398 µs] change: [-4.7112% -4.5478% -4.3846%] (p = 0.00 < 0.05) Performance has improved....
the CI seems to be failing on cuda but i'm not sure what's causing it. it says it can't find `aligned-vec = "0.5"` but it's right there in crates.io https://crates.io/crates/aligned-vec
so, gemm is based on the BLIS papers, which can be found here https://github.com/flame/blis#citations it's more or less a faithful implementation of the algorithms described in the fourth paper. i'm...