Use vectorcall for clang as well
Clang supports https://clang.llvm.org/docs/AttributeReference.html#vectorcall so, it could be used when compiling with clang instead in addition to just MSVC
Good idea, I hadn't noticed this. There seem to be subtle differences between the default calling convention with clang and vectorcall. Clang will pass up to 8 vectors by value in registers but it doesn't appear to return aggregates by value and I'm not sure if it passes aggregates by value, I'd have to double check again. On the other hand, vector supports up to 6 arguments by value but it handles aggregates better.
We have to measure in a real application what the impact is to make sure it yields a net win.
It says on that page:
Homogeneous vector aggregates of up to four elements are passed in sequential SSE registers if enough are available
But of course for this kind of thing, testing/benchmarking is definitely the way to go