xsimd icon indicating copy to clipboard operation
xsimd copied to clipboard

Runtime detection of instruction set

Open abique opened this issue 3 years ago • 12 comments

Hi,

I am wondering if it is possible to perform a runtime detection of the instruction set to use? Basically the approach would be:

template <typename vector_type>
struct processor
{
  // ...
};

template <typename vector_type>
processor<vector_type> *create_processor()
{
   return new processor<vector_type>();
}

auto *processor = xsimd::native_dispatch<create_processor>();

I'm not sure if this is already possible, after reading the examples I've concluded that it isn't.

To be clear, my goal is to produce one executable that will be able to take advantage of the best instruction set available, with the lowest overhead possible.

Many thanks, and I hope that was the right way to ask.

Regards, Alexandre

abique avatar Jul 14 '20 14:07 abique

When reading this file: https://github.com/xtensor-stack/xsimd/blob/master/include/xsimd/types/xsimd_avx512_float.hpp it seems that you'd need to move the delcarations inside a _xsimd::avx512 namespace, that way you could have multiple instructions set support at the same time without have conflicts rights?

And you can do using namespace xsimd = _xsimd::avx512; at the very end. What do you think?

abique avatar Jul 14 '20 14:07 abique

I don't think you need to move the declarations of any batch class into another namespace. All these classes are different specializations of a template class and can live together (and they actually do on processors that support all of them).

Regarding runtime detection, you can get the supported instructions via cpuid and then return a processor that will build / return the appropriate batch type.

JohanMabille avatar Jul 15 '20 10:07 JohanMabille

So if we want to have every intel options, we compile with -mavx512 -mtune=generic, and then we could implement the dynamic dispatch.

Do you think that the dynamic dispatch could be part of the library?

abique avatar Jul 15 '20 14:07 abique

Yes that would be awesome!

JohanMabille avatar Jul 15 '20 14:07 JohanMabille

I close the issue as I might not need it anymore.

abique avatar Jul 30 '20 09:07 abique

Runtime CPU capability detection and dispatch is a fairly essential feature when you're shipping a library to end users via a packaging system. Otherwise all you can do is use the lowest common denominator (probably SSE4 right now, at least for something as widely used as NumPy). Unless this is implemented already, would it make sense to reopen this feature request?

In case it's helpful, here's how NumPy does this currently: https://numpy.org/devdocs/reference/simd/simd-optimizations.html#understanding-cpu-dispatching-how-the-numpy-dispatcher-works.

rgommers avatar Apr 16 '21 16:04 rgommers

Unless this is implemented already, would it make sense to reopen this feature request?

We haven't implemented it already, so it definitely make sense to reopen this. However, I wonder if we should implement it in a dedicated repo. I could make sense for header-only downstream packages such as xtensor to not depend on binary files. We can also have the source live here and provide two packages (this might be easier for releasing actually).

In case it's helpful, here's how NumPy does this currently

Thanks for the pointer, that's super helpful!

JohanMabille avatar Apr 16 '21 16:04 JohanMabille

Is the use case somehow similar to that: https://godbolt.org/z/Me3xGvW4b ? I'd like to be sure of the intended usage :-)

serge-sans-paille avatar Apr 16 '21 20:04 serge-sans-paille

Pretty much I think. It needs a cache for getBestInstructionSet I guess to make it performant, but the principle seems right.

rgommers avatar Apr 16 '21 21:04 rgommers

xsimd is now packaged for Debian GNU/Linux. We'd appreciate this kind of functionality, it could greatly improve the value of client libraries such as dolfinx.

drew-parsons avatar Sep 13 '21 18:09 drew-parsons

This is now part of 8.0.0, see https://xsimd.readthedocs.io/en/latest/api/dispatching.html

serge-sans-paille avatar Sep 13 '21 21:09 serge-sans-paille

Nice! Thank you.

drew-parsons avatar Sep 13 '21 21:09 drew-parsons

In addition to the xsimd::dispatch function mentioned above, #906 introduces xsimd::best_arch which provides an easy access to the best supported arch for current compilation unit. This PR also improves documentation on xsimd architecture, which should close this issue :-)

serge-sans-paille avatar Mar 06 '23 21:03 serge-sans-paille