STL icon indicating copy to clipboard operation
STL copied to clipboard

Should `ranges::minmax` be auto-vectorized instead of manual vectorization.

Open AlexGuteniev opened this issue 1 year ago • 2 comments

I now see that manually-vectorizing ranges::minmax in #4384 was not necessarily a good idea. Original implementation was implemented in terms of minmax_element and was not vectorized. But if it was implemented directly, it would have been vectorized. Consider the following:

unsigned int* max_element(unsigned int*b, unsigned int* e){
    unsigned int m = *b;
    unsigned int* r = b;
    for (++b; b != e; ++b){
        if (m < *b) {
            m = *b;
            r = b;
        }
    }
    return b;
}

unsigned int max1(unsigned int*b, unsigned int* e){
    return *max_element(b, e);
}

unsigned int max2(unsigned int *b, unsigned int* e){
    unsigned int m = *b;
    for (++b; b != e; ++b){
        if (m < *b) {
            m = *b;
        }
    }
    return m;
}

max1 is not vectorized, but max2 is. see https://godbolt.org/z/fG6Yzfz4v

So we could just reimplement ranges::max_element in headers and enjoy vectorization with more context.

Question how should we proceed:

  • Keep manual vectorization forever
  • Drop it in vNext
  • Drop it any time

AlexGuteniev avatar Mar 06 '24 08:03 AlexGuteniev

We talked about this at the weekly maintainer meeting and we're in favor of implementing the ranges::minmax value-based family with dedicated code in the core header <__msvc_minmax.hpp> (currently only providing types), having <algorithm> use that, and also having vector_algorithms.cpp use that to implement _Minmax (because the import lib can never shrink until vNext).

We should check the benchmark to make sure that this doesn't regress performance significantly.

StephanTLavavej avatar Mar 06 '24 22:03 StephanTLavavej

From #4734 there's a conclusion that floating point types without /fp:fast option should be vectorized with _element approach. This has to be kept.

The compiler doesn't generate the _element apporach, doesn't try to vectorize floating min/max at all, and currently even fails to apply the scalar approach properly, see DevCom-10686775.

AlexGuteniev avatar Jun 22 '24 13:06 AlexGuteniev

As mentioned in #5010, it looks like keeping manual vectorization is best for now.

StephanTLavavej avatar Oct 16 '24 21:10 StephanTLavavej