secp256k1 icon indicating copy to clipboard operation
secp256k1 copied to clipboard

`src/scalar_4x64_impl.h:361:5: error: ‘asm’ operand has impossible constraints`

Open tersec opened this issue 1 year ago • 5 comments

~/secp256k1 % gcc -c -march=native -O1 -DENABLE_MODULE_EXTRAKEYS=1 -DUSE_ASM_X86_64 src/secp256k1.c                      
In file included from src/scalar_impl.h:20,
                 from src/secp256k1.c:28:
src/scalar_4x64_impl.h: In function ‘secp256k1_scalar_mul’:
src/scalar_4x64_impl.h:361:5: error: ‘asm’ operand has impossible constraints
  361 |     __asm__ __volatile__(
      |     ^~~~~~~
src/scalar_4x64_impl.h:475:5: error: ‘asm’ operand has impossible constraints
  475 |     __asm__ __volatile__(
      |     ^~~~~~~
~/secp256k1 % gcc -c -march=native -O1 -fomit-frame-pointer -DENABLE_MODULE_EXTRAKEYS=1 -DUSE_ASM_X86_64 src/secp256k1.c
In file included from src/scalar_impl.h:20,
                 from src/secp256k1.c:28:
src/scalar_4x64_impl.h: In function ‘secp256k1_scalar_mul’:
src/scalar_4x64_impl.h:361:5: error: ‘asm’ operand has impossible constraints
  361 |     __asm__ __volatile__(
      |     ^~~~~~~
src/scalar_4x64_impl.h:475:5: error: ‘asm’ operand has impossible constraints
  475 |     __asm__ __volatile__(
      |     ^~~~~~~
~/secp256k1 % gcc -c -march=native -O2 -fomit-frame-pointer -DENABLE_MODULE_EXTRAKEYS=1 -DUSE_ASM_X86_64 src/secp256k1.c
In file included from src/scalar_impl.h:20,
                 from src/secp256k1.c:28:
src/scalar_4x64_impl.h: In function ‘secp256k1_scalar_reduce_512’:
src/scalar_4x64_impl.h:361:5: error: ‘asm’ operand has impossible constraints
  361 |     __asm__ __volatile__(
      |     ^~~~~~~
src/scalar_4x64_impl.h:475:5: error: ‘asm’ operand has impossible constraints
  475 |     __asm__ __volatile__(
      |     ^~~~~~~
~/secp256k1 % gcc -c -march=native -O2 -DENABLE_MODULE_EXTRAKEYS=1 -DUSE_ASM_X86_64 src/secp256k1.c         
In file included from src/scalar_impl.h:20,
                 from src/secp256k1.c:28:
src/scalar_4x64_impl.h: In function ‘secp256k1_scalar_reduce_512’:
src/scalar_4x64_impl.h:361:5: error: ‘asm’ operand has impossible constraints
  361 |     __asm__ __volatile__(
      |     ^~~~~~~
src/scalar_4x64_impl.h:475:5: error: ‘asm’ operand has impossible constraints
  475 |     __asm__ __volatile__(
      |     ^~~~~~~
~/secp256k1 % gcc -c -march=native -O3 -DENABLE_MODULE_EXTRAKEYS=1 -DUSE_ASM_X86_64 src/secp256k1.c
In file included from src/scalar_impl.h:20,
                 from src/secp256k1.c:28:
src/scalar_4x64_impl.h: In function ‘secp256k1_scalar_reduce_512’:
src/scalar_4x64_impl.h:361:5: error: ‘asm’ operand has impossible constraints
  361 |     __asm__ __volatile__(
      |     ^~~~~~~
src/scalar_4x64_impl.h:475:5: error: ‘asm’ operand has impossible constraints
  475 |     __asm__ __volatile__(
      |     ^~~~~~~
~/secp256k1 % gcc -c -march=native -O3 -fomit-frame-pointer -DENABLE_MODULE_EXTRAKEYS=1 -DUSE_ASM_X86_64 src/secp256k1.c
In file included from src/scalar_impl.h:20,
                 from src/secp256k1.c:28:
src/scalar_4x64_impl.h: In function ‘secp256k1_scalar_reduce_512’:
src/scalar_4x64_impl.h:361:5: error: ‘asm’ operand has impossible constraints
  361 |     __asm__ __volatile__(
      |     ^~~~~~~
src/scalar_4x64_impl.h:475:5: error: ‘asm’ operand has impossible constraints
  475 |     __asm__ __volatile__(
      |     ^~~~~~~
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 % lscpu 
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          48 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   16
  On-line CPU(s) list:    0-15
Vendor ID:                AuthenticAMD
  Model name:             AMD Ryzen 7 PRO 8700GE w/ Radeon 780M Graphics
    CPU family:           25
    Model:                117
    Thread(s) per core:   2
    Core(s) per socket:   8
    Socket(s):            1
    Stepping:             2
    Frequency boost:      enabled
Linux version 5.15.0-118-generic (buildd@lcy02-amd64-080) (gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #128-Ubuntu SMP Fri Jul 5 09:28:59 UTC 2024

Commit 68b55209f1ba3e6c0417789598f5f75649e9c14c

tersec avatar Oct 24 '24 03:10 tersec

I'm unable to reproduce this on my gcc 14.2.1. Can you provide some more context, please?

  • What does native resolve to? Can you provide a reproduction command with the specific arch?
  • Does this really happen with -fomit-frame-pointer (which is the default on -O1)?
  • Have you tried more recent gcc versions? Is this the original gcc in Ubuntu? If yes, can you give us instructions on how to reproduce this with a docker image?
  • Is this a regression in our code?

The referenced issues seem to have some partial answers to these questions, but those also appear to contradict your report here because -fomit-frame-pointer seems to have resolved your issue. So I'm really not sure about the details of the report.

The error message usually means that there are not enough registers, but I don't see how narrowing to a specific arch can make a (correct) gcc assume that there are fewer registers.

real-or-random avatar Oct 24 '24 08:10 real-or-random

  • native resolves to znver3. However, specifying -march=znver3, or -march=znver3 -mtune=znver3, does not result in this compiler error. In more detail, gcc -march=native -Q --help=target output. To excerpt from that:
gcc -march=native -Q --help=target | grep -E '(march|mcpu|mtune)='
  -march=                     		znver3
  -mcpu=                      		
  -mtune=                     		znver3
  • Yes, it really does happen with -fomit-frame-pointer. That's why I've pointedly included that variation, because I know in past build issues that's been one question/suggestion/recommendation. Yes, yes it does. See the what I already posted.
  • I have not tried with more recent gcc versions. This is, yes, the default version of gcc in Ubuntu 22.04.

-fomit-frame-pointer did not no, resolve the issue. Again, from what's already posted above:

~/secp256k1 % gcc -c -march=native -O1 -fomit-frame-pointer -DENABLE_MODULE_EXTRAKEYS=1 -DUSE_ASM_X86_64 src/secp256k1.c
In file included from src/scalar_impl.h:20,
                 from src/secp256k1.c:28:
src/scalar_4x64_impl.h: In function ‘secp256k1_scalar_mul’:
src/scalar_4x64_impl.h:361:5: error: ‘asm’ operand has impossible constraints
  361 |     __asm__ __volatile__(
      |     ^~~~~~~
src/scalar_4x64_impl.h:475:5: error: ‘asm’ operand has impossible constraints
  475 |     __asm__ __volatile__(
      |     ^~~~~~~

-fomit-frame-pointer is specified. Explicitly. Also for -O3:

~/secp256k1 % gcc -c -march=native -O3 -fomit-frame-pointer -DENABLE_MODULE_EXTRAKEYS=1 -DUSE_ASM_X86_64 src/secp256k1.c
In file included from src/scalar_impl.h:20,
                 from src/secp256k1.c:28:
src/scalar_4x64_impl.h: In function ‘secp256k1_scalar_reduce_512’:
src/scalar_4x64_impl.h:361:5: error: ‘asm’ operand has impossible constraints
  361 |     __asm__ __volatile__(
      |     ^~~~~~~
src/scalar_4x64_impl.h:475:5: error: ‘asm’ operand has impossible constraints
  475 |     __asm__ __volatile__(
      |     ^~~~~~~

-fomit-frame-pointer is specified and very definitely does not resolve this issue. Nor does it for

~/secp256k1 % gcc -c -march=native -O2 -fomit-frame-pointer -DENABLE_MODULE_EXTRAKEYS=1 -DUSE_ASM_X86_64 src/secp256k1.c
In file included from src/scalar_impl.h:20,
                 from src/secp256k1.c:28:
src/scalar_4x64_impl.h: In function ‘secp256k1_scalar_reduce_512’:
src/scalar_4x64_impl.h:361:5: error: ‘asm’ operand has impossible constraints
  361 |     __asm__ __volatile__(
      |     ^~~~~~~
src/scalar_4x64_impl.h:475:5: error: ‘asm’ operand has impossible constraints
  475 |     __asm__ __volatile__(
      |     ^~~~~~~

Which also explicitly and specifically specifies -fomit-frame-pointer, this time for -O2.

  • Regarding whether it's a regression, we've only ever seen it on znver3 targets, and only seen it reproduced so far on Ubuntu 20.04, 22.04, and 24.04 (though the machine I'm testing this one now is 22.04). My suspicion is that it's not a regression of previous targets, up to and including znver2, but never worked with -march=native on a znver3 target. But this is speculation.

Regarding reproducing this in a Docker image, the tricky thing is that:

% gcc -c -march=native -O1 -fomit-frame-pointer -DENABLE_MODULE_EXTRAKEYS=1 -DUSE_ASM_X86_64 src/secp256k1.c 
In file included from src/scalar_impl.h:20,
                 from src/secp256k1.c:28:
src/scalar_4x64_impl.h: In function ‘secp256k1_scalar_mul’:
src/scalar_4x64_impl.h:361:5: error: ‘asm’ operand has impossible constraints
  361 |     __asm__ __volatile__(
      |     ^~~~~~~
src/scalar_4x64_impl.h:475:5: error: ‘asm’ operand has impossible constraints
  475 |     __asm__ __volatile__(
      |     ^~~~~~~
% gcc -c -march=znver3 -mtune=znver3 -O1 -fomit-frame-pointer -DENABLE_MODULE_EXTRAKEYS=1 -DUSE_ASM_X86_64 src/secp256k1.c
%

That is, gcc on this machine claims that native resolves to -march=znver3 -mtune=znver3, but those options, which would allow machine-independent, and Docker-based, reproduction, don't even do so in otherwise the exact same conditions. -march=native is triggering something else salient too, but I have not yet identified what. It's 100% deterministic and consistent.

But the main point is, no, -fomit-frame-pointer does not resolve this. I was already aware of that point.

Edit to add another output of what -march=native does:

echo | gcc -### -E - -march=native 
 /usr/lib/gcc/x86_64-linux-gnu/11/cc1 -E -quiet -imultiarch x86_64-linux-gnu - "-march=znver3" -mmmx -mpopcnt -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mavx -mavx2 -msse4a -mno-fma4 -mno-xop -mfma -mavx512f -mbmi -mbmi2 -maes -mpclmul -mavx512vl -mavx512bw -mavx512dq -mavx512cd -mno-avx512er -mno-avx512pf -mavx512vbmi -mavx512ifma -mno-avx5124vnniw -mno-avx5124fmaps -mavx512vpopcntdq -mavx512vbmi2 -mgfni -mvpclmulqdq -mavx512vnni -mavx512bitalg -mavx512bf16 -mno-avx512vp2intersect -mno-3dnow -madx -mabm -mno-cldemote -mclflushopt -mclwb -mclzero -mcx16 -mno-enqcmd -mf16c -mfsgsbase -mfxsr -mno-hle -msahf -mno-lwp -mlzcnt -mmovbe -mno-movdir64b -mno-movdiri -mmwaitx -mno-pconfig -mpku -mno-prefetchwt1 -mprfchw -mno-ptwrite -mrdpid -mrdrnd -mrdseed -mno-rtm -mno-serialize -mno-sgx -msha -mshstk -mno-tbm -mno-tsxldtrk -mvaes -mno-waitpkg -mwbnoinvd -mxsave -mxsavec -mxsaveopt -mxsaves -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -mno-uintr -mno-hreset -mno-kl -mno-widekl -mno-avxvnni --param "l1-cache-size=32" --param "l1-cache-line-size=64" --param "l2-cache-size=1024" "-mtune=znver3" -fasynchronous-unwind-tables -fstack-protector-strong -Wformat -Wformat-security -fstack-clash-protection -fcf-protection -dumpbase -

tersec avatar Oct 24 '24 09:10 tersec

No -march=native. Reproduces on another machine, with

gcc (Debian 14.2.0-7) 14.2.0
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

And, yes, it uses -fomit-frame-pointer:

$ gcc -c -march=znver3 -mavx512f -O1 -fomit-frame-pointer -DENABLE_MODULE_EXTRAKEYS=1 -DUSE_ASM_X86_64 src/secp256k1.c 
In file included from src/scalar_impl.h:20,
                 from src/secp256k1.c:28:
In function ‘secp256k1_scalar_reduce_512’,
    inlined from ‘secp256k1_scalar_mul’ at src/scalar_4x64_impl.h:868:5:
src/scalar_4x64_impl.h:361:5: error: ‘asm’ operand has impossible constraints or there are not enough registers
  361 |     __asm__ __volatile__(
      |     ^~~~~~~
src/scalar_4x64_impl.h:475:5: error: ‘asm’ operand has impossible constraints or there are not enough registers
  475 |     __asm__ __volatile__(
      |     ^~~~~~~
$ gcc -c -march=znver3 -mavx512f -O2 -fomit-frame-pointer -DENABLE_MODULE_EXTRAKEYS=1 -DUSE_ASM_X86_64 src/secp256k1.c 
In file included from src/scalar_impl.h:20,
                 from src/secp256k1.c:28:
src/scalar_4x64_impl.h: In function ‘secp256k1_scalar_reduce_512’:
src/scalar_4x64_impl.h:361:5: error: ‘asm’ operand has impossible constraints or there are not enough registers
  361 |     __asm__ __volatile__(
      |     ^~~~~~~
src/scalar_4x64_impl.h:475:5: error: ‘asm’ operand has impossible constraints or there are not enough registers
  475 |     __asm__ __volatile__(
      |     ^~~~~~~
$ gcc -march=znver3 -mavx512f -O3 -fomit-frame-pointer -DENABLE_MODULE_EXTRAKEYS=1 -DUSE_ASM_X86_64 src/secp256k1.c 
In file included from src/scalar_impl.h:20,
                 from src/secp256k1.c:28:
src/scalar_4x64_impl.h: In function ‘secp256k1_scalar_reduce_512’:
src/scalar_4x64_impl.h:361:5: error: ‘asm’ operand has impossible constraints or there are not enough registers
  361 |     __asm__ __volatile__(
      |     ^~~~~~~
src/scalar_4x64_impl.h:475:5: error: ‘asm’ operand has impossible constraints or there are not enough registers
  475 |     __asm__ __volatile__(
      |     ^~~~~~~

The relevant flag is -mavx512f.

tersec avatar Oct 24 '24 09:10 tersec

Argh, all of this CPU stuff is so confusing.

Model name: AMD Ryzen 7 PRO 8700GE w/ Radeon 780M Graphics

That's a Zen4. But znver4 is only supported on GCC 13 and newer. I share the expectation that -march=znver3 should be equivalent to -march=native on GCC 11... As you point out, the affected GCC 11 appears, for whatever reason, to be (overly) clever and adds the -mavx512f. Perhaps this was some initial/half-ready support for znver4, or simply a bug?

In any case, I see this on my machine with gcc version 14.2.1 20240910:

It works with -march=znver4 -mavx512f (and also without explicit -mavx512f, which should be implied):

$ gcc -c -march=znver4 -mavx512f -O2 -fomit-frame-pointer -DUSE_ASM_X86_64 src/secp256k1.c

It errors with -march=znver3 -mavx512f, but that's a strange set of flags because no such CPU exists:

$ gcc -c -march=znver3 -mavx512f -O2 -fomit-frame-pointer -DUSE_ASM_X86_64 src/secp256k1.c
In file included from src/scalar_impl.h:20,
                 from src/secp256k1.c:28:
src/scalar_4x64_impl.h: In function ‘secp256k1_scalar_reduce_512’:
src/scalar_4x64_impl.h:361:5: error: ‘asm’ operand has impossible constraints or there are not enough registers
  361 |     __asm__ __volatile__(
      |     ^~~~~~~
src/scalar_4x64_impl.h:475:5: error: ‘asm’ operand has impossible constraints or there are not enough registers
  475 |     __asm__ __volatile__(
      |     ^~~~~~~

I still don't know what the cause of this is, but the problem disappears with the correct flags on a recent GCC. So my conclusion so far is that this is not our bug.

real-or-random avatar Oct 24 '24 22:10 real-or-random

I share the expectation that -march=znver3 should be equivalent to -march=native on GCC 11... As you point out, the affected GCC 11 appears, for whatever reason, to be (overly) clever and adds the -mavx512f. Perhaps this was some initial/half-ready support for znver4, or simply a bug?

Okay, -march=native can detect individual CPU features. I believe this is precisely what you observe with GCC 11. And it turns out that this auto-detection produces a broken set of flags, presumably because noone had tested it on a Zen with AVX512 (because no such CPU existed when GCC 11 was released).

real-or-random avatar Oct 24 '24 22:10 real-or-random

So my conclusion so far is that this is not our bug.

Closing for now, but please don't hesitate to reply if you think my analysis is wrong, or if you believe we should do something about this.

real-or-random avatar Nov 01 '24 15:11 real-or-random