Mention alignment
When working with SIMD, for example, memory alignment may matter.
It would be good if this guide also discusses ways to allocate memory for SIMD that is, e.g., aligned to 64-byte boundaries.
Thanks for the suggestion! This sounds like it could be really useful for people. Here are three barriers:
- Would this increase the scope of the project too much? Maybe SIMD should be a separate document because I know there are multiple different libraries that have different tradeoffs.
- I only know the basic ideas of SIMD and haven't used it, but I would love to learn about it!
- I just had a baby so I probably won't have much time in the near future. 😉
One thing I'd love to hear from you is: is there a specific problem that you're hoping to solve? That would help me understand what kind of information would be useful.
No worries, I do not expect this to be added immediately. It is just a suggestion.
Well, I know that SIMD supposedly can improve the performance, and that it can benefit from memory alignment. Intel documentation say that it is desirable to have 64 byte alignment.
I have seen (with cargo asm) that my code is automatically vectorized (I am not writing explicit SIMD, which currently is a nightly experimental API), but it uses vmovupd instructions, which is the unaligned variant.
I figured out that with std::alloc::Layout one can allocate arrays to be 64-byte aligned, (I tried following the Rustonomicon Vec example but there remained some error), and did not yield vmovpd -- I guess you need to hint the compiler that the later accesses are all aligned, too - which may be nightly-only for now, so it will have to wait.
Congratulations, enjoy.
Thanks!
This detail is helpful. Here are some of the questions that we'll need to think about:
- How can we help the user avoid
unsafe? - What tools and layers of abstraction does the user need to understand? E.g. do they need to know about
cargo asm? - Is the user counting on the compiler to optimize? Or is the user using
std::simd? I have also seen third-party libraries like faster and simdeez but that was a couple years ago, maybe they have been supplanted. - Is the user on stable Rust or nightly? What version of Rust?
- How much control does the user have over the allocator? If they're writing an executable, they probably have full control but in a library it might be more complicated.
- What processors and instruction sets does the user want to target?
- If the user wants the compiler to optimize, are there any conditions that will guarantee the optimization happens? Often it seems to be best-effort.