openvpn
openvpn copied to clipboard
pkt_testdriver segmentation fault on Debian Unstable arm64 on AWS Graviton 7
Describe the bug
The pkt_testdriver test fails due to a segmentation fault on latest Git "master" OpenVPN on Debian Unstable arm64. This does not seem to affect the Debian Unstable on amd64.
To Reproduce
This should be reproduceable with latest Debian Unstable Docker hub container images simply by running the usual build procedure followed by "make check". Alternatively updating the Debian Unstable arm64 base image to the latest upstream (debian) version in Buildbot should trigger this behavior.
Expected behavior
The pkt_testdriver test should pass and not segfault.
Version information (please complete the following information):
- OS: Debian Unstable arm64
- OpenVPN version: Git master
Additional context
Can you run the testdriver from gdb & gather a backtrace ("where") when it crashes?
$ cd tests/unit_tests/openvpn
$ gdb pkt_testdriver
...
(gdb) run
...
(gdb) where
Can you run the testdriver from gdb & gather a backtrace ("where") when it crashes?
$ cd tests/unit_tests/openvpn $ gdb pkt_testdriver ... (gdb) run ... (gdb) where
Sure, I will.
As discussed in today's community meeting I'll test this immediately again and rebuild the Debian system. If the problem still exists I shall wait for a week or two. This could just be a result of a broken compiler or such which might be fixed or get a fix shortly - the packages in Debian unstable get updated very frequently.
Rebuilding the Debian unstable arm64 container image from scratch did not help. I shall retry later.
The failure is still present after a forced rebuild of the Debian unstable arm64 image. I'll go the gdb route.
gdb logs:
(gdb) run
Starting program: /tmp/openvpn/tests/unit_tests/openvpn/pkt_testdriver
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
[==========] pkt tests: Running 10 test(s).
[ RUN ] test_tls_decrypt_lite_none
[ OK ] test_tls_decrypt_lite_none
[ RUN ] test_tls_decrypt_lite_auth
[ OK ] test_tls_decrypt_lite_auth
[ RUN ] test_tls_decrypt_lite_crypt
Program received signal SIGSEGV, Segmentation fault.
0x0000b5a4e11745c8 in tls_pre_decrypt_lite (tas=tas@entry=0xffffd7ab2670, state=state@entry=0xffffd7ab2420, from=from@entry=0xffffd7ab23f0, buf=buf@entry=0xffffd7ab23d8) at ../../../src/openvpn/ssl_pkt.c:323
323 uint8_t pkt_firstbyte = *BPTR(buf);
(gdb) where
#0 0x0000b5a4e11745c8 in tls_pre_decrypt_lite (tas=tas@entry=0xffffd7ab2670, state=state@entry=0xffffd7ab2420, from=from@entry=0xffffd7ab23f0, buf=buf@entry=0xffffd7ab23d8) at ../../../src/openvpn/ssl_pkt.c:323
#1 0x0000b5a4e11653d8 in test_tls_decrypt_lite_crypt (ut_state=<optimized out>) at test_pkt.c:257
#2 0x0000e043d5ca58dc in ?? () from /lib/aarch64-linux-gnu/libcmocka.so.0
#3 0x0000e043d5ca5ed8 [PAC] in _cmocka_run_group_tests () from /lib/aarch64-linux-gnu/libcmocka.so.0
#4 0x0000b5a4e11638b4 [PAC] in main () at test_pkt.c:679
As suggested by @cron2 I compiled with -O0 and then things magically started working:
(gdb) run
Starting program: /tmp/openvpn/tests/unit_tests/openvpn/pkt_testdriver
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
[==========] pkt tests: Running 10 test(s).
[ RUN ] test_tls_decrypt_lite_none
[ OK ] test_tls_decrypt_lite_none
[ RUN ] test_tls_decrypt_lite_auth
[ OK ] test_tls_decrypt_lite_auth
[ RUN ] test_tls_decrypt_lite_crypt
[ OK ] test_tls_decrypt_lite_crypt
[ RUN ] test_parse_ack
[ OK ] test_parse_ack
[ RUN ] test_calc_session_id_hmac_static
[ OK ] test_calc_session_id_hmac_static
[ RUN ] test_verify_hmac_none
[ OK ] test_verify_hmac_none
[ RUN ] test_verify_hmac_tls_auth
[ OK ] test_verify_hmac_tls_auth
[ RUN ] test_generate_reset_packet_plain
[ OK ] test_generate_reset_packet_plain
[ RUN ] test_generate_reset_packet_tls_auth
[ OK ] test_generate_reset_packet_tls_auth
[ RUN ] test_extract_control_message
[ OK ] test_extract_control_message
[==========] pkt tests: 10 test(s) run.
[ PASSED ] 10 test(s).
[Inferior 1 (process 26706) exited normally]
Tests with different AWS instances and macs seem to suggest that this crash only happens on 7th generation Graviton instances. We have seen it on m7g and c7g AWS instances, but not on any other ARM machines, e.g. not on c6g instances.
(gdb) print buf
$19 = {capacity = 1024, offset = 1450110702, len = 54,
data = 0xbadf1298c5e0 "8\364#\313\022\321\371\344\217"}
Is this still a thing? Aka, is it still crashing, is anyone working on it, or shall we just close the issue and claim "it magically fixed itself"?