trafficserver icon indicating copy to clipboard operation
trafficserver copied to clipboard

quiche CI build failing

Open brbzull0 opened this issue 1 year ago • 2 comments

It's been failing for quite some time now, this is the list of failing tests.

  • [ ] active_timeout (@brbzull0 - works locally(using quictls, ubuntu 20.04)
  • [ ] ja3_fingerprint
  • [ ] proxy_protocol
  • [ ] sigusr2
  • [ ] stek_share
  • [ ] tls_0rtt_server
  • [ ] tls_client_verify3
  • [ ] tls_sni_ip_allow
  • [ ] tls_sni_with_port
  • [ ] tls_ticket
  • [ ] tls_tunnel
  • [ ] tls_tunnel_forward
  • [ ] tunnel_transform
  • [ ] txn_type (@brbzull0 - works locally(using quictls, ubuntu 20.04) - fail with bssl.
  • [ ] via (https://github.com/apache/trafficserver/issues/9783)

I will be trying them all locally and on the ci docker till we sort them all out. If you want to pick any autests and try it out, please add the notes on the side of the test name.

brbzull0 avatar Jan 22 '24 14:01 brbzull0

active_timeout there seems to be a crash un rockylinux, could be related to some library version, etc:

[Jan 22 15:35:32.154] [ET_NET 10] DIAG: <QUICNetVConnection.cc:373 (handle_received_packet)> (v_quic_net) [] failed to process packet: -9
[Jan 22 15:35:32.154] [ET_NET 10] DIAG: <QUICNetVConnection.cc:664 (_handle_interval)> (quic_net) [] is_app=0 error_code=10 reason=
[Jan 22 15:35:32.154] [ET_NET 11] DIAG: <QUICNetVConnection.cc:373 (handle_received_packet)> (v_quic_net) [] failed to process packet: -9
[Jan 22 15:35:32.155] [ET_NET 11] DIAG: <QUICNetVConnection.cc:664 (_handle_interval)> (quic_net) [] is_app=0 error_code=10 reason=
[Jan 22 15:35:33.982] [ET_NET 10] DIAG: <QUICNetVConnection.cc:118 (free_thread)> (quic_net) [] Free connection
[Jan 22 15:35:33.989] [ET_NET 11] DIAG: <QUICNetVConnection.cc:118 (free_thread)> (quic_net) [] Free connection
[Jan 22 15:35:34.056] [ET_UDP 0] DIAG: <QUICNetVConnection.cc:102 (destroy)> (quic_net) [] Destroy connection
[Jan 22 15:35:34.056] [ET_UDP 0] DIAG: <QUICNetVConnection.cc:102 (destroy)> (quic_net) [] Destroy connection
[Jan 22 15:35:40.264] [ET_UDP 0] DIAG: <QUICPacketHandler.cc:267 (_recv_packet)> (quic_sec) [e91af5e3-224fe664] client initial dcid=0x224fe66471f09d1fbc1fc9611a6fbc891ff7
[Jan 22 15:35:40.265] [ET_UDP 0] DIAG: <QUICPacketHandler.cc:267 (_recv_packet)> (quic_sec) [e91af5e3-224fe664] client initial dcid=0x224fe66471f09d1fbc1fc9611a6fbc891ff7
[Jan 22 15:35:40.265] [ET_NET 0] DIAG: <QUICNextProtocolAccept.cc:53 (mainEvent)> (v_quic) [] event 202 netvc 0x7f1761243220
[Jan 22 15:35:40.265] [ET_NET 1] DIAG: <QUICNextProtocolAccept.cc:53 (mainEvent)> (v_quic) [] event 202 netvc 0x7f1761242bd0
[Jan 22 15:35:40.266] [ET_NET 0] DIAG: <QUICNetVConnection.cc:373 (handle_received_packet)> (v_quic_net) [] failed to process packet: -9
[Jan 22 15:35:40.266] [ET_NET 0] DIAG: <QUICNetVConnection.cc:664 (_handle_interval)> (quic_net) [] is_app=0 error_code=10 reason=
[Jan 22 15:35:40.266] [ET_NET 1] DIAG: <QUICNetVConnection.cc:373 (handle_received_packet)> (v_quic_net) [] failed to process packet: -9
[Jan 22 15:35:40.266] [ET_NET 1] DIAG: <QUICNetVConnection.cc:664 (_handle_interval)> (quic_net) [] is_app=0 error_code=10 reason=
[Jan 22 15:35:41.972] [ET_NET 0] DIAG: <QUICNetVConnection.cc:118 (free_thread)> (quic_net) [] Free connection
[Jan 22 15:35:42.003] [ET_NET 1] DIAG: <QUICNetVConnection.cc:118 (free_thread)> (quic_net) [] Free connection
[Jan 22 15:35:42.068] [ET_UDP 0] DIAG: <QUICNetVConnection.cc:102 (destroy)> (quic_net) [] Destroy connection
[Jan 22 15:35:42.068] [ET_UDP 0] DIAG: <QUICNetVConnection.cc:102 (destroy)> (quic_net) [] Destroy connection
traffic_server: received signal 11 (Segmentation fault)
traffic_server - STACK TRACE: 
/tmp/sandbox/active_timeout/ts/bin/traffic_server(_Z19crash_logger_invokeiP9siginfo_tPv+0xce)[0x8dc520]
/lib64/libpthread.so.0(+0x12cf0)[0x7f178a07ccf0]
/lib64/libc.so.6(cfree+0x21)[0x7f1789d40e31]
/tmp/sandbox/active_timeout/ts/bin/traffic_server(_Z8ats_freePv+0x28)[0x91700f]
/tmp/sandbox/active_timeout/ts/bin/traffic_server(_ZN16HttpConfigParamsD1Ev+0x2c)[0x8e4a1a]
/lib64/libc.so.6(+0x5126c)[0x7f1789cf626c]
/lib64/libc.so.6(on_exit+0x0)[0x7f1789cf63a0]
/lib64/libc.so.6(__libc_start_main+0xec)[0x7f1789cdfd8c]
/tmp/sandbox/active_timeout/ts/bin/traffic_server(_start+0x2e)[0x8dba7e]

brbzull0 avatar Jan 22 '24 15:01 brbzull0

txn_type ( and some others ) seems to be related to bssl.

rocky8:

#0  __libc_write (nbytes=218, buf=0x7f1ff4029788, fd=54) at ../sysdeps/unix/sysv/linux/write.c:26
#1  __libc_write (fd=54, buf=0x7f1ff4029788, nbytes=218) at ../sysdeps/unix/sysv/linux/write.c:24
#2  0x0000000000b07734 in SocketManager::write (fd=54, buf=0x7f1ff4029788, size=218) at /home/jenkins/trafficserver/src/iocore/dns/../eventsystem/P_UnixSocketManager.h:142
#3  0x0000000000c6e65e in fastopen_bwrite (bio=0x7f1ff4029ef8, in=0x7f1ff4029788 "\026\003\001", insz=218) at /home/jenkins/trafficserver/src/iocore/net/BIO_fastopen.cc:131
#4  0x00007f2029ded799 in BIO_write (bio=0x7f1ff4029ef8, in=<optimized out>, inl=<optimized out>) at /root/build_h3_tools/boringssl/crypto/bio/bio.c:173
#5  0x00007f202a1bf789 in bssl::tls_flush_flight (ssl=0x7f1ff402afe8) at /root/build_h3_tools/boringssl/ssl/s3_both.cc:333
#6  0x00007f202a1b6a3a in bssl::ssl_run_handshake (hs=0x7f1ff402fd48, out_early_return=out_early_return@entry=0x7f202560266f) at /root/build_h3_tools/boringssl/ssl/handshake.cc:602
#7  0x00007f202a1d028a in SSL_do_handshake (ssl=0x7f1ff402afe8) at /root/build_h3_tools/boringssl/ssl/ssl_lib.cc:865
#8  SSL_do_handshake (ssl=0x7f1ff402afe8) at /root/build_h3_tools/boringssl/ssl/ssl_lib.cc:849
#9  0x0000000000c13f5e in SSLNetVConnection::_ssl_connect (this=0x7f202b0d83c0) at /home/jenkins/trafficserver/src/iocore/net/SSLNetVConnection.cc:2462
#10 0x0000000000c1155e in SSLNetVConnection::sslClientHandShakeEvent (this=0x7f202b0d83c0, err=@0x7f2025602fe8: 0) at /home/jenkins/trafficserver/src/iocore/net/SSLNetVConnection.cc:1608
#11 0x0000000000c105e1 in SSLNetVConnection::sslStartHandShake (this=0x7f202b0d83c0, event=1, err=@0x7f2025602fe8: 0) at /home/jenkins/trafficserver/src/iocore/net/SSLNetVConnection.cc:1272
#12 0x0000000000c0d548 in SSLNetVConnection::net_read_io (this=0x7f202b0d83c0, nh=0x7f202675aba0, lthread=0x7f202675a010) at /home/jenkins/trafficserver/src/iocore/net/SSLNetVConnection.cc:611
#13 0x0000000000c8c3da in NetHandler::process_ready_list (this=0x7f202675aba0) at /home/jenkins/trafficserver/src/iocore/net/NetHandler.cc:276
#14 0x0000000000c8c733 in NetHandler::waitForActivity (this=0x7f202675aba0, timeout=10000000) at /home/jenkins/trafficserver/src/iocore/net/NetHandler.cc:364
#15 0x0000000000cc6fd9 in EThread::execute_regular (this=0x7f202675a010) at /home/jenkins/trafficserver/src/iocore/eventsystem/UnixEThread.cc:299
#16 0x0000000000cc716f in EThread::execute (this=0x7f202675a010) at /home/jenkins/trafficserver/src/iocore/eventsystem/UnixEThread.cc:348
#17 0x0000000000cc5af1 in spawn_thread_internal (a=0x205beb0) at /home/jenkins/trafficserver/src/iocore/eventsystem/Thread.cc:68
#18 0x00007f20283e41ca in start_thread (arg=<optimized out>) at pthread_create.c:479
#19 0x00007f2028050e73 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

brbzull0 avatar Jan 23 '24 11:01 brbzull0

stek_share:

[Mar 20 14:07:18.869] traffic_server NOTE: plugin.config loading ...
[Mar 20 14:07:18.869] traffic_server NOTE: loading plugin '/tmp/sandbox/stek_share/ts1/plugin/stek_share.so'
[Mar 20 14:07:18.873] traffic_server ERROR: unable to load '/tmp/sandbox/stek_share/ts1/plugin/stek_share.so': /opt/lib/libnuraft.so: undefined symbol: SSL_ctrl
[Mar 20 14:07:18.873] traffic_server FATAL: unable to load '/tmp/sandbox/stek_share/ts1/plugin/stek_share.so': /opt/lib/libnuraft.so: undefined symbol: SSL_ctrl

Could be an issue in the way we build nuraft I did try that, using the same CI commands to build it and found the same error. Weird this does not happens in other CI's

brbzull0 avatar Mar 22 '24 14:03 brbzull0

stek_share:

[Mar 20 14:07:18.869] traffic_server NOTE: plugin.config loading ...
[Mar 20 14:07:18.869] traffic_server NOTE: loading plugin '/tmp/sandbox/stek_share/ts1/plugin/stek_share.so'
[Mar 20 14:07:18.873] traffic_server ERROR: unable to load '/tmp/sandbox/stek_share/ts1/plugin/stek_share.so': /opt/lib/libnuraft.so: undefined symbol: SSL_ctrl
[Mar 20 14:07:18.873] traffic_server FATAL: unable to load '/tmp/sandbox/stek_share/ts1/plugin/stek_share.so': /opt/lib/libnuraft.so: undefined symbol: SSL_ctrl

Could be an issue in the way we build nuraft I did try that, using the same CI commands to build it and found the same error. Weird this does not happens in other CI's

Yeah, I'm tinkering around to see I can get a nuraft build with boringssl, getting some build errors for now.

duke8253 avatar Mar 22 '24 15:03 duke8253

I have a fix for the stek_share plugin now, ~I'll reference a PR once that's ready.~ #11185

duke8253 avatar Mar 26 '24 15:03 duke8253

I have a fix for the stek_share plugin now, ~I'll reference a PR once that's ready.~ #11185

Latest master quiche autests are passing now: https://ci.trafficserver.apache.org/view/master/job/master/job/quiche/315/consoleFull

Running Test stek_share:.. Passed

bneradt avatar Mar 28 '24 19:03 bneradt

The cause of 0rtt test failure is probably https://github.com/apache/trafficserver/issues/11203

maskit avatar Apr 02 '24 21:04 maskit

Closing this issue. All fixed now. Thanks to everyone!

brbzull0 avatar Apr 08 '24 10:04 brbzull0