resolve-march-native icon indicating copy to clipboard operation
resolve-march-native copied to clipboard

SSE4 is special (weird) and causes some problems

Open alex-orange-UofU opened this issue 1 year ago • 4 comments

CPU in question is: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz

Problem is that CPU supports some, but not all of SSE4 and gcc reports this in a weird way. Specifically the output of gcc -Q --help=target -march=native has:

  -mno-red-zone                         [disabled]
  -mno-sse4                             [disabled]
  -mnop-mcount                          [disabled]
...
  -msse4                                [disabled]
  -msse4.1                              [enabled]
  -msse4.2                              [disabled]
  -msse4a                               [disabled]

As you can see it both has -mno-sse4 disabled and -msse4 disabled. This causes the flags extracted to have both -mno-sse4 (which is the correct flag) as well as -msse4 (which causes problems).

As some background, apparently -msse4 is a special flag (as best I can tell) that turns on all of -msse4.1, -msse4.2, and -msse4a. My processor doesn't support -msse4.2 and -msse4a which means this is a problem as gcc is spitting out binaries with sse4.2 opcodes.

Once I find the appropriate forum I'll be asking gcc to clarify their documentation on the -msse4 bit, and determine whether having -mno-sse4 as disable is a bug, or just a strange meaning.

In the mean time, would the best approach be to just treat sse4 as a special case. As far as I can tell, setting it one way or the other is not a "useful" thing to do, as I think it's just controlling all the sse4x states. My suggestion would be to just filter out the exact -msse4 flag for now. (exact meaning not filter out the sse4x flags).

alex-orange-UofU avatar Sep 13 '24 19:09 alex-orange-UofU

Hi @alex-orange, sounds like -msse4 is tribool in nature…

Once I find the appropriate forum I'll be asking gcc to clarify their documentation on the -msse4 bit, and determine whether having -mno-sse4 as disable is a bug, or just a strange meaning.

Yes please!

In the mean time, would the best approach be to just treat sse4 as a special case. As far as I can tell, setting it one way or the other is not a "useful" thing to do, as I think it's just controlling all the sse4x states. My suggestion would be to just filter out the exact -msse4 flag for now. (exact meaning not filter out the sse4x flags).

Is there a chance that you could provide the four text files produce by:

export LC_ALL=C
gcc -S -fverbose-asm -o /dev/stdout "$(mktemp --suffix=.c)" -march=native > YOUR_ARCH_HERE--assembly--native.txt
gcc -S -fverbose-asm -o /dev/stdout "$(mktemp --suffix=.c)" -march=YOUR_ARCH_HERE > YOUR_ARCH_HERE--assembly--explicit.txt
gcc -Q --help=target -march=native > YOUR_ARCH_HERE--target-help--native.txt
gcc -Q --help=target -march=YOUR_ARCH_HERE > YOUR_ARCH_HERE--target-help--explicit.txt 

They would allow me to test any new SSE4 behavior and make sure that it doesn't break without noticing in future releases.

Thanks in advance!

hartwork avatar Sep 13 '24 21:09 hartwork

Note this is the gcc bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116708

alex-orange-UofU avatar Sep 16 '24 16:09 alex-orange-UofU

Note, this is a harpertown arch, but with a lot of that age it just identifies as a core2. core2--assembly--explicit.txt core2--assembly--native.txt core2--target-help--explicit.txt core2--target-help--native.txt

alex-orange-UofU avatar Sep 16 '24 16:09 alex-orange-UofU

@alex-orange very cool, thanks a bunch! :tada:

I have pull request #180 about it ready for review and testing if you find time :pray:

hartwork avatar Sep 18 '24 15:09 hartwork

As stated in the PR: Works for me. Produces the same output except without -msse4.

alex-orange-UofU avatar Sep 20 '24 16:09 alex-orange-UofU

I notice you're on the gentoo bug as well. Will you be handling the process of getting a new rev into gentoo, or should I file a bug to pull in the new change? (Probably best if there's a new release of course).

alex-orange-UofU avatar Sep 20 '24 19:09 alex-orange-UofU

I notice you're on the gentoo bug as well. Will you be handling the process of getting a new rev into gentoo, or should I file a bug to pull in the new change? (Probably best if there's a new release of course).

@alex-orange resolve-march-native 5.1.0 with the fix is now released to GitHub, PyPI and Gentoo. Enjoy :pray:

hartwork avatar Sep 20 '24 19:09 hartwork

Thanks :)

alex-orange-UofU avatar Sep 30 '24 18:09 alex-orange-UofU