openssl icon indicating copy to clipboard operation
openssl copied to clipboard

Unable to run asm code on OpenBSD (arm64)

Open Sashan opened this issue 1 year ago • 13 comments

In order to get asm code running on OpenBSD we must place all constants into .rodata sections.

closes #23312

Sashan avatar Apr 14 '24 23:04 Sashan

Is the update to cloudflare-quiche needed as part of this?

tom-cosgrove-arm avatar Apr 15 '24 08:04 tom-cosgrove-arm

Note, when building on Apple Silicon I get

m2 $ perl Configure --banner="Done" darwin64-arm64 no-shared no-threads no-async -Wimplicit-function-declaration
Configuring OpenSSL version 3.4.0-dev for target darwin64-arm64
Using os-specific seed configuration
Created configdata.pm
Running configdata.pm
Created Makefile.in
Created Makefile
Done

m2 $ make       
perl util/mkinstallvars.pl PREFIX=. BINDIR=apps LIBDIR= INCLUDEDIR=include APPLINKDIR=ms ENGINESDIR=engines MODULESDIR=providers "VERSION=3.4.0-dev" "LDLIBS= " > builddata.pm
perl "-I." "-Iutil/perl" "-Mconfigdata" "-MOpenSSL::paramnames" "util/dofile.pl" "-oMakefile" crypto/params_idx.c.in > crypto/params_idx.c
perl util/mkinstallvars.pl "PREFIX=/usr/local" BINDIR=bin "LIBDIR=lib" INCLUDEDIR=include APPLINKDIR=include/openssl "ENGINESDIR=/usr/local/lib/engines-3" "MODULESDIR=/usr/local/lib/ossl-modules" "PKGCONFIGDIR=/usr/local/lib/pkgconfig" "CMAKECONFIGDIR=/usr/local/lib/cmake/OpenSSL" "LDLIBS= " "VERSION=3.4.0-dev" > installdata.pm
perl "-I." "-Mconfigdata" "util/dofile.pl" "-oMakefile" include/crypto/bn_conf.h.in > include/crypto/bn_conf.h
:
:
cc  -Icrypto -I. -Iinclude -Iproviders/common/include -Iproviders/implementations/include  -DBSAES_ASM -DECP_NISTZ256_ASM -DECP_SM2P256_ASM -DKECCAK1600_ASM -DMD5_ASM -DOPENSSL_BN_ASM_MONT -DOPENSSL_CPUID_OBJ -DOPENSSL_SM3_ASM -DPOLY1305_ASM -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DSM4_ASM -DVPAES_ASM -DVPSM4_ASM -fPIC -arch arm64 -O3 -Wall -Wimplicit-function-declaration -DL_ENDIAN -DOPENSSL_PIC -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-3\"" -DMODULESDIR="\"/usr/local/lib/ossl-modules\"" -DOPENSSL_BUILDING_OPENSSL -DNDEBUG  -c -o crypto/aes/libcrypto-lib-aesv8-armx.o crypto/aes/aesv8-armx.S
/var/folders/45/s6l2x9xn1vndxk4syvrgrfxh0000gn/T/aesv8-armx-361efa.s:41:2: error: unknown AArch64 fixup kind!
 add x3,x3,:lo12:Lrcon
 ^
make[1]: *** [crypto/aes/libcrypto-lib-aesv8-armx.o] Error 1
make: *** [build_sw] Error 2

tom-cosgrove-arm avatar Apr 15 '24 09:04 tom-cosgrove-arm

Note, when building on Apple Silicon I get

m2 $ perl Configure --banner="Done" darwin64-arm64 no-shared no-threads no-async -Wimplicit-function-declaration
Configuring OpenSSL version 3.4.0-dev for target darwin64-arm64
Using os-specific seed configuration
Created configdata.pm
Running configdata.pm
Created Makefile.in
Created Makefile
Done

m2 $ make       
perl util/mkinstallvars.pl PREFIX=. BINDIR=apps LIBDIR= INCLUDEDIR=include APPLINKDIR=ms ENGINESDIR=engines MODULESDIR=providers "VERSION=3.4.0-dev" "LDLIBS= " > builddata.pm
perl "-I." "-Iutil/perl" "-Mconfigdata" "-MOpenSSL::paramnames" "util/dofile.pl" "-oMakefile" crypto/params_idx.c.in > crypto/params_idx.c
perl util/mkinstallvars.pl "PREFIX=/usr/local" BINDIR=bin "LIBDIR=lib" INCLUDEDIR=include APPLINKDIR=include/openssl "ENGINESDIR=/usr/local/lib/engines-3" "MODULESDIR=/usr/local/lib/ossl-modules" "PKGCONFIGDIR=/usr/local/lib/pkgconfig" "CMAKECONFIGDIR=/usr/local/lib/cmake/OpenSSL" "LDLIBS= " "VERSION=3.4.0-dev" > installdata.pm
perl "-I." "-Mconfigdata" "util/dofile.pl" "-oMakefile" include/crypto/bn_conf.h.in > include/crypto/bn_conf.h
:
:
cc  -Icrypto -I. -Iinclude -Iproviders/common/include -Iproviders/implementations/include  -DBSAES_ASM -DECP_NISTZ256_ASM -DECP_SM2P256_ASM -DKECCAK1600_ASM -DMD5_ASM -DOPENSSL_BN_ASM_MONT -DOPENSSL_CPUID_OBJ -DOPENSSL_SM3_ASM -DPOLY1305_ASM -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DSM4_ASM -DVPAES_ASM -DVPSM4_ASM -fPIC -arch arm64 -O3 -Wall -Wimplicit-function-declaration -DL_ENDIAN -DOPENSSL_PIC -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-3\"" -DMODULESDIR="\"/usr/local/lib/ossl-modules\"" -DOPENSSL_BUILDING_OPENSSL -DNDEBUG  -c -o crypto/aes/libcrypto-lib-aesv8-armx.o crypto/aes/aesv8-armx.S
/var/folders/45/s6l2x9xn1vndxk4syvrgrfxh0000gn/T/aesv8-armx-361efa.s:41:2: error: unknown AArch64 fixup kind!
 add x3,x3,:lo12:Lrcon
 ^
make[1]: *** [crypto/aes/libcrypto-lib-aesv8-armx.o] Error 1
make: *** [build_sw] Error 2

The clouflare-quiche is accidental change. mostlikely caused by my poor git skills.

Took a look at CI-results and see the failure on Mac-OS too. will investigate and update PR with fix. please stay tuned.

Sashan avatar Apr 15 '24 09:04 Sashan

EDIT: just noticed error on cross-compilation, will take a look and ping you again

@tom-cosgrove-arm please, now it's a good time for review. all checks are green (I think). The list of changes since last time:

  • the accidental addition of cloudflare-quiche is gone from PR (thanks for spotting this).
  • all changes come from @botovq except a small tweak to arm-xlate.pl
  • change to arm-xlate.pl makes macos-14 happy as well as arm linux builds those changes do work for OpenSSL running on OpenBSD as far as I can tell.

Sashan avatar Apr 17 '24 09:04 Sashan

Hope to see this PR be merged soon!

Sankyou avatar Apr 28 '24 03:04 Sankyou

@tom-cosgrove-arm could you please look?

t8m avatar Apr 29 '24 13:04 t8m

the file crypto/aes/asm/aesv8-armx.pl is also used to build armv7 (32-bits) the adrp instruction we use to deal with address relocation on 64-bit CPU does not work on 32-bit armv7 (compilation fails)

According to 9.4.2.4 there are two ways to deal with relocation on 32-bit arm. In this particular case of aesv8-armx we can use the first variant like that:

	movw $ptr, #:lower16:.Lrcon
	movt $ptr, #:upper16:.Lrcon

Unfortunately it fails to link. The linker fails with error as follows:

Run make -s -j4
  make -s -j4
  shell: /usr/bin/bash -e {0}
/usr/lib/gcc-cross/arm-linux-gnueabihf/11/../../../../arm-linux-gnueabihf/bin/ld: crypto/aes/libcrypto-shlib-aesv8-armx.o: relocation R_ARM_THM_MOVW_ABS_NC against `a local symbol' can not be used when making a shared object; recompile with -fPIC
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:6464: libcrypto.so.3] Error 1
make[1]: *** Waiting for unfinished jobs....
make: *** [Makefile:3826: build_sw] Error 2
Error: Process completed with exit code 2.

The error is odd. Because if I understand the generated Makefile right the command which compiles the module reads as follows (snippet comes from Makefile):

3713 CFLAGS=-Wall -O3 -DPEDANTIC -pedantic -Wno-long-long -DUNUSEDRESULT_DEBUG -Wall -Wmissing-declarations -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -Wswitch -Wsign-compare -Wshadow -Wformat -Wno-type-limits -Wno-tautological-constant-out-of-range-compare -Wundef -Werror -Wmissing-prototypes -      Wstrict-prototypes
....
3743 CNF_CFLAGS=-pthread -Wa,--noexecstack
....
3755 LIB_CFLAGS=-fPIC $(CNF_CFLAGS) $(CFLAGS)
....
7258 crypto/aes/libcrypto-shlib-aesv8-armx.o: crypto/aes/aesv8-armx.S
7259	$(CC)  -Icrypto -I. -Iinclude -Iproviders/common/include -Iproviders/implementations/include  -DAES_ASM -DBSAES_ASM -DECP_NISTZ256_ASM -DGHASH_ASM -DKECCAK1600_ASM -DOPENSSL_BN_ASM_GF2m -DOPENSSL_BN_ASM_MONT -DOPENSSL_CPUID_OBJ -DPOLY1305_ASM -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM $(LIB_CFLAGS) $(LIB_CPPFLAGS) -c -o $@ crypto/aes/aesv8-armx.S
7260 crypto/aes/aesv8-armx.S: crypto/aes/asm/aesv8-armx.pl 
	CC="$(CC)" $(PERL) crypto/aes/asm/aesv8-armx.pl "$(PERLASM_SCHEME)" -Icrypto -I. -Iinclude -Iproviders/common/include -Iproviders/implementations/include $(LIB_CFLAGS) $(LIB_CPPFLAGS) -DAES_ASM -DBSAES_ASM -DECP_NISTZ256_ASM -DGHASH_ASM -DKECCAK1600_ASM -DOPENSSL_BN_ASM_GF2m -DOPENSSL_BN_ASM_MONT -DOPENSSL_CPUID_OBJ -DPOLY1305_ASM -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM $(PROCESSOR) $@

I'm not able to get the whole build log to exactly see how libcrypto-shlib-aesv8-armx.o got produced by the compiler. However the Makefile I got from artifacts is straightforward. I do believe -fPIC option is passed to the compiler. The linker error does not make sense to me.

The other alternative mentioned in 9.4.2.4 reads as follows:

	movs $ptr, #:upper8_15:#.Lrcon
	lsls $ptr, $ptr, #8
	adds $ptr, #:upper0_7:#.Lrcon
	lsls $ptr, $ptr, #8
	adds $ptr, #:lower8_15:#.Lrcon
	lsls $ptr, $ptr, #8
	adds $ptr, #:lower0_7:#.Lrcon

The build has finished successfully. but tests did fail with error as follows:

SHA3 : (KAT_Digest) : Pass
AES_GCM : (KAT_Cipher) : qemu: uncaught target signal 11 (Segmentation fault) - core dumped
../../util/wrap.pl ../../apps/openssl fipsinstall -pedantic -module ../../providers/fips.so -provider_name fips -section_name fips_sect -out ../../test/fipsmodule.cnf => 139
not ok 1 - fips install
# ------------------------------------------------------------------------------
#   Failed test 'fips install'
#   at test/recipes/00-prep_fipsmodule_cnf.t line 33.
# Looks like you failed 1 test of 1.00-prep_fipsmodule_cnf.t .. 
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/1 subtests 

It looks like we die in AES module. I suspect we touch an invalid address. Like the position of .Lrcon constant is miscalculated.

Hitting those corners I've decided to opt for solution no to let aesv8-armx.pl generate .rodata section for 32 bit flavour.

Sashan avatar Apr 29 '24 13:04 Sashan

The error is odd. [...] I do believe -fPIC option is passed to the compiler. The linker error does not make sense to me.

-fPIC will change the kinds of instructions the compiler emits. Since we're dealing with handwritten assembly, the choice of instructions are already burned in, so there's not much the flag can do to change anything. We have to write the assembly in a PIC-compatible form to begin with.

The pattern you've used here loads the absolute address of .Lrcon directly into $ptr. That is, the address itself (split over two instructions) is used as an operand in the code. That's not a position-independent pattern. When the assembler processes that, it emits a relocation because the address is not known yet.

Then, when you go to link it, because you're making a position-independent output (shared library), the linker also cannot resolve the relocation. It would need to be deferred to the loader, which actually knows the address the overall shared library is mapped to.

The linker is then rejecting this because shared libraries aren't supposed to have runtime .text relocations, so that most pages don't need to be dirtied and thus unshared.

What you need is a PC-relative address. Although the assembler syntaxes are similar, aarch64's ADRP + ADD/LDR pattern adds PC to the operand, so what's stored is the delta between the reference and the symbol. That's also a relocation because the assembler doesn't know the distance between .text and .rodata; in the end, all the .text sections are put together and all the .rodata sections are put together. But that relocation is allowed because it's resolved at link time. No matter where you load the shared library, the delta between .text and .rodata is fixed, which in turn fixes the delta between symbol and reference.

In x86_64, this is fairly straightforward, at least within the same shared library , because you just write whatever(%rip) and emit a larger instruction with enough room for this delta. (Though if you've heard of larger code models, those are exactly to account for getting more room in these deltas.) Other, more RISC-y, ISAs are more complex because you can't just make the instruction big enough to fit the delta. That's why aarch64 has the ADRP + ADD/LDR split; a single instruction doesn't have enough room to encode the delta.

Why does this come up when you move from .text to .rodata? It's purely a size thing. When the assembler references symbols from the same section in the same file, the distance is very small. In fact the distance is totally determined by the assembly file and independent of what else you link in, so it's totally predictable. But as soon as you cross sections, it's no longer predictable. It can get as large as the whole of libcrypto.so, or the entire calling application if it links OpenSSL statically.

Even in the single-instruction forms, 32-bit Arm is very complex. You've got both Arm mode and Thumb mode, both of which have very different forms of the relevant instructions, and impact the size of other instructions. Some of the forms use a very tricky immediate encoding where, e.g., you can encode larger multiples of 4 than you can encode odd numbers. There are some files in OpenSSL Arm assembly that are at the very very edge of fitting, such that if you added or removed an instruction, it's less aligned and no longer fits! (Being so tight is only viable because the distance is totally fixed by being a same-section + same-file reference.)

Hitting those corners I've decided to opt for solution no to let aesv8-armx.pl generate .rodata section for 32 bit flavour.

That's what we did too. We don't attempt to support XOM on 32-bit Arm in BoringSSL. There are much, much fewer patterns available to do this on 32-bit Arm. IIRC the available relocations also differ a lot by platform. (The way OpenSSL does the PC-relative addressing on 32-bit iOS is quite hairy. Thankfully, 32-bit iOS no longer exists.) If you look at compiler output, it usually establishes constant-banks near the functions... which requires reading from .text anyway. Unless some platform comes up that actually needs it (and new platforms ought to move to aarch64 anyway), I think we can ignore this.

davidben avatar Apr 29 '24 15:04 davidben

@tom-cosgrove-arm could you please look?

Sankyou avatar May 06 '24 12:05 Sankyou

Yes, will do; the other part of my job is very busy ATM

tom-cosgrove-arm avatar May 07 '24 00:05 tom-cosgrove-arm

Sorry to bother,when do you plan to review this PR? @tom-cosgrove-arm

Sankyou avatar May 15 '24 02:05 Sankyou

@tom-cosgrove-arm could you please look?

Sankyou avatar May 30 '24 03:05 Sankyou

This PR is in a state where it requires action by @openssl/committers but the last update was 30 days ago

openssl-machine avatar Jun 30 '24 00:06 openssl-machine

@tom-cosgrove-arm If you're really busy,maybe you can assign someone else to review?

Sankyou avatar Jul 18 '24 11:07 Sankyou

@tom-cosgrove-arm I'm sorry for being pushy, I think it would be great if we will get this to 3.4 so arm64 will be on par with amd64. thanks.

Sashan avatar Aug 07 '24 07:08 Sashan

ping for second review

t8m avatar Aug 29 '24 13:08 t8m

Hi @tom-cosgrove-arm, don’t want to push you too much but, is there any news or deadlines on the PR?

yavtuk avatar Sep 26 '24 12:09 yavtuk

This PR is in a state where it requires action by @openssl/committers but the last update was 30 days ago

openssl-machine avatar Oct 27 '24 00:10 openssl-machine

@tom-cosgrove-arm You haven't reviewed this PR for a very very long time. Is there any problem?

Sankyou avatar Nov 18 '24 08:11 Sankyou

@t8m can we ask anyone else from maintainers look at this PR?

yavtuk avatar Dec 13 '24 14:12 yavtuk

up

yavtuk avatar Jan 10 '25 06:01 yavtuk

@tom-cosgrove-arm

Sankyou avatar Jan 10 '25 07:01 Sankyou

It seems the review is done, can you merge this PR? @t8m

Sankyou avatar Jan 14 '25 06:01 Sankyou

24 hours has passed since 'approval: done' was set, but as this PR has been updated in that time the label 'approval: ready to merge' is not being automatically set. Please review the updates and set the label manually.

openssl-machine avatar Jan 14 '25 09:01 openssl-machine

Merged to the master branch after commit message tweaks. Thank you.

t8m avatar Jan 14 '25 11:01 t8m

crypto/aes/aesv8-armx.S:12:1: error: unknown directive
.previous
^
make[1]: *** [Makefile:5360: crypto/aes/libcrypto-lib-aesv8-armx.obj] Error 1
make[1]: *** Waiting for unfinished jobs....
make: *** [Makefile:2258: build_sw] Error 2

This PR breaks MinGW Arm64 builds. Looks like #23997

Andarwinux avatar Jan 14 '25 12:01 Andarwinux

@Andarwinux please open a new issue.

t8m avatar Jan 14 '25 13:01 t8m

@Andarwinux please open a new issue.

https://github.com/openssl/openssl/issues/26415

Andarwinux avatar Jan 14 '25 14:01 Andarwinux