avo icon indicating copy to clipboard operation
avo copied to clipboard

instructions: GFNI, VAES and VPCLMULQDQ naming inconsistency with x/sys/cpu

Open vsivsi opened this issue 2 years ago • 8 comments

With the addition of the new Ice Lake AVX-512 instruction support in Avo, I've encountered a couple of wrinkles in the naming of the required CPUID features for a given function generated by Avo and the golang x/sys/cpu feature detection support.

A bit of history... Up until this point, all Avo CPUID feature names have:

  1. Faithfully followed the Intel naming of these feature sets, and
  2. Enabled a direct 1:1 mapping between the Avo feature name, and the corresponding golang x/sys/cpu flag.

That is, if an Avo generated function says it requires "AVX2", "AVX512F" and "AVX512DQ", those strings can be directly used to generate code to test for the truth of those requrements. (e.g. a "cpu.X86.Has%s" template can be used to generate: cpu.X86.HasAVX2 && cpu.X86.HasAVX512F && cpu.X86.HasAVX512DQ.)

So here's the icky new problem. The Intel blessed (and Avo adopted) identifiers for the GFNI and VAES instructions are literally that. And there appears to be a good reason for this, because unlike AVX512 dependent features like AVX512_VPOPCNTDQ or AVX512_BITALG, Intel is signalling here that it intends to ship SSE and/or AVX (VEX) encoded forms of these instructions that can run on hardware without AVX512 support enabled.

In fact it apparently already has, shipping SSE encoded GFNI support on the "Intel Atom processor based on Tremont microarchitecture" that lacks AVX/AVX512.

image

So what's the wrinkle? Golang's standard library has defined and implemented identifiers that lock these features to AVX512 both in naming and in the detection logic. See: https://cs.opensource.google/go/x/sys/+/refs/tags/v0.4.0:cpu/cpu_x86.go;l=133-134

This is arguably a bug because, for example, that "Intel Atom processor based on Tremont microarchitecture" will have perfectly functional SSE encoded GFNI instructions that the Go assembler will happily emit (*but see comment below), but that can never run because the standard library assumes AVX512F is a requirement.

Backing out of this likely will require a breaking change to the semantics of the x/sys/cpu library, namely, eliminating the erroneous X86.HasAVX512GNFI and X86.HasAVX512VAES symbols, and replacing them with X86.GNFI and X86.VAES and new logic to ensure they are enabled when present in the absence of AVX512 (the old versions could be maintained with current semantics and deprecated). This could take a long time, and I think there's a non-zero chance they'll just punt and say it's good enough, which would be annoying, and forever require special case code to map between the proper naming for these things and the golang implementation.

So this isn't really an issue in Avo, so much as an issue for Avo.

vsivsi avatar Jan 13 '23 23:01 vsivsi

  • Actually, I'm unsure how (or if) the Go assembler decides how to select among the VEX and EVEX, encodings for a 256-bit wide instruction. For SSE encoding the omitted "V" prefix (e.g. VPADDQ vs PADDQ) is the signal. Does it always use VEX unless an AVX512 option necessitating EVEX is used (e.g. mask, memory broadcast, ymm > ymm15, etc.)?

vsivsi avatar Jan 13 '23 23:01 vsivsi

And everything above also applies to VPCLMULQDQ including the misnaming/handling in x/sys/cpu

vsivsi avatar Jan 13 '23 23:01 vsivsi

Thanks for pointing this out. I could consider a change in avo to match. It would be a very minor break, but since GFNI isn't even in a tagged release I suspect it wouldn't affect anyone.

However, I tend to agree that we've got this right in avo and x/sys/cpu is wrong? I agree the Go project is probably not going to think this is worth a breaking change for, though.

In the event of implementing feature checks #168 I don't think having a special-case fixup for those specific ISAs is going to be that bad?

Note that @klauspost's cpuid agrees with avo here:

https://pkg.go.dev/github.com/klauspost/cpuid/v2#FeatureID

mmcloughlin avatar Jan 14 '23 19:01 mmcloughlin

Ah! Sorry I'm just now grasping what you're saying. It's not simply an issue of applying a naming transform. x/sys/cpu does not have a flag that indicates the presence of GFNI, it only has one that indicates AVX512F && GFNI.

mmcloughlin avatar Jan 14 '23 19:01 mmcloughlin

  • Actually, I'm unsure how (or if) the Go assembler decides how to select among the VEX and EVEX, encodings for a 256-bit wide instruction. For SSE encoding the omitted "V" prefix (e.g. VPADDQ vs PADDQ) is the signal. Does it always use VEX unless an AVX512 option necessitating EVEX is used (e.g. mask, memory broadcast, ymm > ymm15, etc.)?

If I recall correctly that's exactly what it does. It will use VEX unless it has to use EVEX.

avo's handling of this is not great.

https://github.com/mmcloughlin/avo/blob/05ed388d0f330946c08f331cd897f32d63d50ec5/internal/load/load.go#L786-L799

mmcloughlin avatar Jan 14 '23 19:01 mmcloughlin

Yes, on reflection this is how it would have to work, otherwise it would be impossible to write valid AVX/AVX2 code and ensure that it would not fault on hardware without AVX512.

vsivsi avatar Jan 14 '23 20:01 vsivsi

Yes, there are pure SSE2 (non+VEX)/AVX2 (VEX) versions of these. There are "duplicates":

66 0F 3A 44 /r ib PCLMULQDQ xmm1, xmm2/m128, imm8
VEX.128.66.0F3A.WIG 44 /r ib VPCLMULQDQ xmm1, xmm2, xmm3/m128, imm8
EVEX.128.66.0F3A.WIG 44 /r /ib VPCLMULQDQ xmm1, xmm2, xmm3/m128, imm8

VEX.256.66.0F3A.WIG 44 /r /ib VPCLMULQDQ ymm1, ymm2, ymm3/m256, imm8
EVEX.256.66.0F3A.WIG 44 /r /ib VPCLMULQDQ ymm1, ymm2, ymm3/m256, imm8

I assume the VEX/EVEX versions are only selected by the assembler when using the extended registers.

In #146 I was looking for a way to restrict extended registry usage.

Example of CPU with GFNI and no AVX
 mockcpu_test.go:177: Opening GenuineIntel0090661_ElkhartLake_02_CPUID.txt
    mockcpu_test.go:180: Name: Intel Atom(R) x6425RE Processor @ 1.90GHz
    mockcpu_test.go:182: Max Function:0x1b
    mockcpu_test.go:184: Max Extended Function:0x80000008
    mockcpu_test.go:185: VendorString: GenuineIntel
    mockcpu_test.go:186: VendorID: Intel
    mockcpu_test.go:187: PhysicalCores: 4
    mockcpu_test.go:188: ThreadsPerCore: 1
    mockcpu_test.go:189: LogicalCores: 4
    mockcpu_test.go:190: Family 6 Model: 150 Stepping: 1
    mockcpu_test.go:191: Features: AESNI,CLMUL,CMOV,CMPXCHG8,CX16,ERMS,FLUSH_L1D,FXSR,FXSROPT,GFNI,IA32_ARCH_CAP,IA32_CORE_CAP,IBPB,LAHF,MD_CLEAR,MMX,MOVBE,MOVDIR64B,MOVDIRI,NX,OSXSAVE,POPCNT,RDRAND,RDSEED,RDTSCP,SHA,SPEC_CTRL_SSBD,SSE,SSE2,SSE3,SSE4,SSE42,SSSE3,STIBP,SYSCALL,SYSEE,VMX,WAITPKG,X87,XGETBV1,XSAVE,XSAVEC,XSAVEOPT,XSAVES
    mockcpu_test.go:192: Microarchitecture level: 2
    mockcpu_test.go:193: Cacheline bytes: 64
    mockcpu_test.go:194: L1 Instruction Cache: 32768 bytes
    mockcpu_test.go:195: L1 Data Cache: 32768 bytes
    mockcpu_test.go:196: L2 Cache: 1572864 bytes
    mockcpu_test.go:197: L3 Cache: 4194304 bytes
    mockcpu_test.go:198: Hz: 1900000000 Hz
    mockcpu_test.go:199: Boost: 1900000000 Hz

klauspost avatar Jan 15 '23 12:01 klauspost

I just checked and there doesn't appear to be an existing issue on the golang project for this problem. I've already prototyped it and the fix is straightforward, so I'm considering filing a new issue over there this week.

vsivsi avatar Jan 18 '23 21:01 vsivsi