resolve-march-native
resolve-march-native copied to clipboard
SSE4 is special (weird) and causes some problems
CPU in question is: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz
Problem is that CPU supports some, but not all of SSE4 and gcc reports this in a weird way. Specifically the output of gcc -Q --help=target -march=native has:
-mno-red-zone [disabled]
-mno-sse4 [disabled]
-mnop-mcount [disabled]
...
-msse4 [disabled]
-msse4.1 [enabled]
-msse4.2 [disabled]
-msse4a [disabled]
As you can see it both has -mno-sse4 disabled and -msse4 disabled. This causes the flags extracted to have both -mno-sse4 (which is the correct flag) as well as -msse4 (which causes problems).
As some background, apparently -msse4 is a special flag (as best I can tell) that turns on all of -msse4.1, -msse4.2, and -msse4a. My processor doesn't support -msse4.2 and -msse4a which means this is a problem as gcc is spitting out binaries with sse4.2 opcodes.
Once I find the appropriate forum I'll be asking gcc to clarify their documentation on the -msse4 bit, and determine whether having -mno-sse4 as disable is a bug, or just a strange meaning.
In the mean time, would the best approach be to just treat sse4 as a special case. As far as I can tell, setting it one way or the other is not a "useful" thing to do, as I think it's just controlling all the sse4x states. My suggestion would be to just filter out the exact -msse4 flag for now. (exact meaning not filter out the sse4x flags).
Hi @alex-orange, sounds like -msse4 is tribool in nature…
Once I find the appropriate forum I'll be asking gcc to clarify their documentation on the -msse4 bit, and determine whether having -mno-sse4 as disable is a bug, or just a strange meaning.
Yes please!
In the mean time, would the best approach be to just treat sse4 as a special case. As far as I can tell, setting it one way or the other is not a "useful" thing to do, as I think it's just controlling all the sse4x states. My suggestion would be to just filter out the exact -msse4 flag for now. (exact meaning not filter out the sse4x flags).
Is there a chance that you could provide the four text files produce by:
export LC_ALL=C
gcc -S -fverbose-asm -o /dev/stdout "$(mktemp --suffix=.c)" -march=native > YOUR_ARCH_HERE--assembly--native.txt
gcc -S -fverbose-asm -o /dev/stdout "$(mktemp --suffix=.c)" -march=YOUR_ARCH_HERE > YOUR_ARCH_HERE--assembly--explicit.txt
gcc -Q --help=target -march=native > YOUR_ARCH_HERE--target-help--native.txt
gcc -Q --help=target -march=YOUR_ARCH_HERE > YOUR_ARCH_HERE--target-help--explicit.txt
They would allow me to test any new SSE4 behavior and make sure that it doesn't break without noticing in future releases.
Thanks in advance!
Note this is the gcc bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116708
Note, this is a harpertown arch, but with a lot of that age it just identifies as a core2. core2--assembly--explicit.txt core2--assembly--native.txt core2--target-help--explicit.txt core2--target-help--native.txt
@alex-orange very cool, thanks a bunch! :tada:
I have pull request #180 about it ready for review and testing if you find time :pray:
As stated in the PR: Works for me. Produces the same output except without -msse4.
I notice you're on the gentoo bug as well. Will you be handling the process of getting a new rev into gentoo, or should I file a bug to pull in the new change? (Probably best if there's a new release of course).
I notice you're on the gentoo bug as well. Will you be handling the process of getting a new rev into gentoo, or should I file a bug to pull in the new change? (Probably best if there's a new release of course).
@alex-orange resolve-march-native 5.1.0 with the fix is now released to GitHub, PyPI and Gentoo. Enjoy :pray:
Thanks :)