Vc icon indicating copy to clipboard operation
Vc copied to clipboard

Implement ARM Neon support for Vc

Open oshadura opened this issue 8 years ago • 15 comments

  • [x] branch off mkretz/datapar
  • [x] go through the unmerged commits on mkretz/NEON (https://github.com/VcDevel/Vc/compare/mkretz/NEON) and make a list of commits that should be merged to mkretz/datapar and/or your NEON branch (let @mkretz review)
  • [x] git cherry-pick the chosen commits to be merged to mkretz/datapar
  • [x] adjust to buildsystem and datapar architecture changes (aka. make it compile & pass)
  • [x] pull request
  • [x] cherry pick the remaining commits from mkretz/NEON
  • [ ] TBD

oshadura avatar Sep 12 '16 10:09 oshadura

@mattkretz, I rechecked list and I think all commits are useful or partially useful. I checked merge conflicts and they are affecting mostly build system - travis.yml, CMakeList.txt, OptimizeforArchitecture.txt, test/CMakeLists.txt.

Can you looks please? Or better to create a list?

Maybe I can try to merge in my branch (branched off from mkretz/datapar) and then after pull when all conflicts will be solved?

oshadura avatar Sep 13 '16 15:09 oshadura

Maybe I can try to merge in my branch (branched off from mkretz/datapar) and then after pull when all conflicts will be solved?

You should give it a try. You should be able to resolve some of the cherry pick conflicts yourself. Give me the ones that are too messed up for you to resolve.

mattkretz avatar Sep 15 '16 13:09 mattkretz

What do you think about organizing our work on https://github.com/VcDevel/Vc/projects/3? Makes it easy for you to put subtasks on my to list - and for me to see where you are.

Anyway, have you thought about where the commits should be cherry-picked? Any commits that should go to mkretz/datapar ASAP?

mattkretz avatar Sep 15 '16 13:09 mattkretz

I checked how works a merge from mkretz/NEON to mkretz/datapar and already solved all conflicts, compilation of course fails, but it will be a next step. Will try to push to my branch, and currently writing you a list to cherrypick in mkrets/datapar.

I would like to ask you a question is it make sense to focus on support of all architectures: arm|aarch32|aarch64? Currently I have access to "aarch64", and can try to find old "arm"..

oshadura avatar Sep 20 '16 13:09 oshadura

a merge from mkretz/NEON to mkretz/datapar

Wait, did you do git merge mkretz/NEON then? That's not what I intended. I wanted to have the NEON commits rebased onto mkretz/datapar. Either via git cherry-pick or via git rebase -i.

Edit: Why? Because it'll be very hard (aka impossible) to rebase mkretz/datapar onto master after a merge of NEON.

support of all architectures?

Focus on aarch64 for now. It would be nice to support them all at some point. But it probably doesn't make sense for you at this point.

mattkretz avatar Sep 20 '16 13:09 mattkretz

@mattkretz , my bad, you are totally right. Anyway it was a good practice :) Is it make sense maybe to add to datapar next commits based on detection of system Vc_x86 and Vc_ARM:

https://github.com/VcDevel/Vc/commit/08efe94c58799caf78a085ff5614f2db265b5038 https://github.com/VcDevel/Vc/commit/e6b86f827cbeee56994c0e7491aa6a202583a96e

oshadura avatar Sep 21 '16 09:09 oshadura

I agree those two commits should go to mkretz/datapar. They'll not apply without a bit of extra work, but you should give it a shot.

mattkretz avatar Sep 21 '16 18:09 mattkretz

@mattkretz , I added also two more commits to be cherry-picked:

6cdad3a3fd5ade362fd315951b52b0fc393b8bf5 ccee375b184d648cbdfcdda647fded4aac2b716d 08efe94c58799caf78a085ff5614f2db265b5038 e6b86f827cbeee56994c0e7491aa6a202583a96e

oshadura avatar Oct 25 '16 09:10 oshadura

Looks about right, yes. 6cdad3a3fd5ade362fd315951b52b0fc393b8bf5 might not be important, though. I'm trying to fade out the use of the Vc_IMPL macro in mkretz/datapar. Short summary: there are two somewhat orthogonal relations to the target hardware specifics:

  1. The ABI. This is (since Vc 1.0), the second template parameter to Vc::Vector / Vc::datapar. It relates to the registers that are used to pass objects of type datapar around. The Vc_IMPL macro had an influence on the default ABI parameter.
  2. The ISA. This determines the instructions the compiler may emit when generating machine code. It can only be influenced for certain via compiler flags (if we ignore inline asm). The Vc_IMPL macro did exclude the use of some intrinsics, but that never really stopped the compiler's optimizer from transforming the code to use the more advanced instructions anyway.

Take a look at datapar/macros.h for the new macros.

mattkretz avatar Oct 25 '16 13:10 mattkretz

@oshadura Oh, and please don't push to my branches. If you think something should be merged to my branch please open a pull request instead.

mattkretz avatar Oct 25 '16 13:10 mattkretz

@mattkretz , sorry for this!!! Should I revert? I finally rebased onto mkretz/datapar and start to look on differences of implementation.

oshadura avatar Oct 25 '16 14:10 oshadura

Leave it for now. Shouldn't be a problem for me to rebase onto this time.

mattkretz avatar Oct 25 '16 15:10 mattkretz

@mattkretz, my belated wishes and Happy New Year! I am making a progress, but currently stuck with definition which packing we would like to have implemented for NEON? For example in SSE there are _m128, _m128i, _m128d, for NEON there are a lot of different "packings" (including half-wide) ~ 18?

oshadura avatar Jan 09 '17 16:01 oshadura

@mattkretz, one more question about supportfunctions test, maybe we need to add separate header for ARM, like include/Vc/cpuid.h? In "Cortex-A Series Programmer's guide in Section 20.1.7 Detecting NEON" is used /proc/cpuinfo(text) and auxiliary vector /proc/self/auxv(binary) for reading and interpreting information about the CPU.

oshadura avatar Jan 09 '17 17:01 oshadura

Thanks, a successful and happy new year to you as well!!

different "packings"

You can encode this packing in variations of the ABI tag type. E.g.

namespace datapar_abi {
  struct neon {};
  template <int N> struct partial_neon {};
}

In principle this can be done for SSE, too. Just that NEON can do it more efficiently because it has instruction support for it.

I recommend you ignore the partial_neon<N> ABI for now, though. The partial ABI will likely not see many users.

about supportfunctions test, maybe we need to add separate header for ARM, like include/Vc/cpuid.h?

Ignore those for now. Focus on the core datapar implementation as specified by the latest C++ paper of mine.

mattkretz avatar Jan 10 '17 08:01 mattkretz