contiguous-data-in-rust icon indicating copy to clipboard operation
contiguous-data-in-rust copied to clipboard

Mention alignment

Open kno10 opened this issue 2 years ago • 3 comments

When working with SIMD, for example, memory alignment may matter.

It would be good if this guide also discusses ways to allocate memory for SIMD that is, e.g., aligned to 64-byte boundaries.

kno10 avatar Jan 24 '23 16:01 kno10

Thanks for the suggestion! This sounds like it could be really useful for people. Here are three barriers:

  1. Would this increase the scope of the project too much? Maybe SIMD should be a separate document because I know there are multiple different libraries that have different tradeoffs.
  2. I only know the basic ideas of SIMD and haven't used it, but I would love to learn about it!
  3. I just had a baby so I probably won't have much time in the near future. 😉

One thing I'd love to hear from you is: is there a specific problem that you're hoping to solve? That would help me understand what kind of information would be useful.

paulkernfeld avatar Jan 24 '23 19:01 paulkernfeld

No worries, I do not expect this to be added immediately. It is just a suggestion. Well, I know that SIMD supposedly can improve the performance, and that it can benefit from memory alignment. Intel documentation say that it is desirable to have 64 byte alignment. I have seen (with cargo asm) that my code is automatically vectorized (I am not writing explicit SIMD, which currently is a nightly experimental API), but it uses vmovupd instructions, which is the unaligned variant. I figured out that with std::alloc::Layout one can allocate arrays to be 64-byte aligned, (I tried following the Rustonomicon Vec example but there remained some error), and did not yield vmovpd -- I guess you need to hint the compiler that the later accesses are all aligned, too - which may be nightly-only for now, so it will have to wait.

Congratulations, enjoy.

kno10 avatar Jan 25 '23 08:01 kno10

Thanks!

This detail is helpful. Here are some of the questions that we'll need to think about:

  • How can we help the user avoid unsafe?
  • What tools and layers of abstraction does the user need to understand? E.g. do they need to know about cargo asm?
  • Is the user counting on the compiler to optimize? Or is the user using std::simd? I have also seen third-party libraries like faster and simdeez but that was a couple years ago, maybe they have been supplanted.
  • Is the user on stable Rust or nightly? What version of Rust?
  • How much control does the user have over the allocator? If they're writing an executable, they probably have full control but in a library it might be more complicated.
  • What processors and instruction sets does the user want to target?
  • If the user wants the compiler to optimize, are there any conditions that will guarantee the optimization happens? Often it seems to be best-effort.

paulkernfeld avatar Jan 25 '23 15:01 paulkernfeld