tempesta icon indicating copy to clipboard operation
tempesta copied to clipboard

HTTP/2 vs HTTPS handling

Open krizhanovsky opened this issue 3 years ago • 1 comments

This is a pull request draft, I even didn't compile it yet.

Fix #1624 with explicit ALPN verification against the type of the listening socket, inherited by all established sockets.

krizhanovsky avatar Jun 20 '22 03:06 krizhanovsky

Just caught crash on simple while :; do ./scripts/tempesta.sh --restart; sleep 10; done and wrk -d 3600 -c 1000 -t 2 -H 'connection: close' https://ubuntu:443/ and _single_CPU VM:

[  299.565199] ==================================================================
[  299.568091] BUG: KASAN: null-ptr-deref in _raw_spin_lock+0x56/0xa0
[  299.571037] Write of size 4 at addr 0000000000000004 by task ksoftirqd/0/10
[  299.573838] 
[  299.575955] CPU: 0 PID: 10 Comm: ksoftirqd/0 Tainted: G           O      5.10.35-kdump+ #14
[  299.579093] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[  299.582262] Call Trace:
[  299.584647]  dump_stack+0x7d/0xa3
[  299.587193]  ? _raw_spin_lock+0x56/0xa0
[  299.589855]  kasan_report.cold+0x5/0x37
[  299.592636]  ? _raw_spin_lock+0x56/0xa0
[  299.595205]  check_memory_region+0x142/0x190
[  299.597926]  _raw_spin_lock+0x56/0xa0
[  299.600436]  ? _raw_read_lock_irq+0x30/0x30
[  299.603254]  frang_tls_handler+0x4f/0xf0 [tempesta_fw]
[  299.606131]  ttls_handshake_server_step+0x1ec/0x820 [tempesta_tls]
[  299.608983]  ? ttls_handshake_server_hello+0x670/0x670 [tempesta_tls]
[  299.611868]  ? __inc_numa_state+0x14/0x90
[  299.614592]  ? __alloc_pages_slowpath.constprop.0+0xc70/0xc70
[  299.617503]  ? free_unref_page_commit+0x109/0x190
[  299.620056]  ? ttls_parse_record_hdr+0x292/0x570 [tempesta_tls]
[  299.622816]  ttls_recv+0x2fa/0x5f0 [tempesta_tls]
[  299.625572]  ss_skb_process+0xfa/0x270 [tempesta_fw]
[  299.628187]  ? ttls_handle_alert+0x50/0x50 [tempesta_tls]
[  299.630897]  ? tfw_sock_srv_exit+0x30/0x30 [tempesta_fw]
[  299.633582]  ? _raw_spin_lock+0x65/0xa0
[  299.636016]  ? _raw_read_lock_irq+0x30/0x30
[  299.638580]  tfw_tls_connection_recv+0x125/0x600 [tempesta_fw]
[  299.641310]  ? tfw_tls_connection_lost+0x30/0x30 [tempesta_fw]
[  299.644233]  ss_tcp_process_skb.constprop.0+0x280/0x470 [tempesta_fw]
[  299.647101]  ? ss_do_close+0x7b0/0x7b0 [tempesta_fw]
[  299.649818]  ss_tcp_process_data+0x1eb/0x2b0 [tempesta_fw]
[  299.652524]  ? ss_tcp_process_skb.constprop.0+0x470/0x470 [tempesta_fw]
[  299.655648]  ss_tcp_data_ready+0x79/0xe0 [tempesta_fw]
[  299.658437]  tcp_rcv_established+0x933/0xb40
[  299.661017]  ? tcp_data_queue+0xa40/0xa40
[  299.663528]  ? __bpf_redirect+0x180/0x1c0
[  299.666189]  tcp_v4_do_rcv+0x24a/0x340
[  299.668736]  tcp_v4_rcv+0x137b/0x1490
[  299.671140]  ? __inet_lookup_established+0x232/0x2c0
[  299.673743]  ? tcp_v4_early_demux+0x2d0/0x2d0
[  299.676322]  ip_protocol_deliver_rcu+0x27/0x260
[  299.678869]  ip_local_deliver_finish+0xac/0xc0
[  299.681276]  ip_local_deliver+0xe9/0x1a0
[  299.683703]  ? ip_local_deliver_finish+0xc0/0xc0
[  299.686193]  ? ip_rcv_finish_core.constprop.0+0x166/0x570
[  299.688694]  ip_rcv+0x12e/0x140
[  299.690914]  ? ip_local_deliver+0x1a0/0x1a0
[  299.693211]  ? receive_buf+0x4f0/0x4f0 [virtio_net]
[  299.695694]  ? virtqueue_get_buf_ctx_split+0x5d/0x1d0 [virtio_ring]
[  299.698372]  ? ip_rcv_finish_core.constprop.0+0x570/0x570
[  299.700831]  ? ip_local_deliver+0x1a0/0x1a0
[  299.703088]  __netif_receive_skb_one_core+0xb9/0xf0
[  299.705390]  ? napi_gro_receive+0x250/0x250
[  299.707739]  ? _raw_spin_lock+0x65/0xa0
[  299.710048]  ? _raw_read_lock_irq+0x30/0x30
[  299.712227]  ? virtnet_poll+0x201/0x30f [virtio_net]
[  299.714572]  process_backlog+0xe3/0x270
[  299.716729]  napi_poll+0xfe/0x2f0
[  299.719001]  net_rx_action+0x14f/0x300
[  299.721230]  ? napi_poll+0x2f0/0x2f0
[  299.723398]  ? __kernel_fpu_begin_mask+0x7c/0x120
[  299.725705]  __do_softirq+0xe9/0x368
[  299.727898]  ? takeover_tasklets+0x2b0/0x2b0
[  299.730237]  run_ksoftirqd+0x15/0x20
[  299.732379]  smpboot_thread_fn+0x194/0x2c0
[  299.734562]  ? smpboot_register_percpu_thread+0x180/0x180
[  299.736912]  ? smpboot_register_percpu_thread+0x180/0x180
[  299.739413]  kthread+0x1c6/0x1f0
[  299.741391]  ? kthread_create_worker_on_cpu+0xa0/0xa0
[  299.743614]  ret_from_fork+0x1f/0x30
[  299.745664] ==================================================================
[  299.748438] Disabling lock debugging due to kernel taint
[  299.750990] BUG: kernel NULL pointer dereference, address: 0000000000000004
[  299.753648] #PF: supervisor write access in kernel mode
[  299.755985] #PF: error_code(0x0002) - not-present page
[  299.758391] PGD 0 P4D 0 
[  299.760413] Oops: 0002 [#1] SMP KASAN PTI
[  299.762691] CPU: 0 PID: 10 Comm: ksoftirqd/0 Tainted: G    B      O      5.10.35-kdump+ #14
[  299.765459] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[  299.768339] RIP: 0010:_raw_spin_lock+0x6e/0xa0
[  299.770639] Code: be 04 00 00 00 c7 44 24 20 00 00 00 00 e8 5a 83 78 ff be 04 00 00 00 48 8d 7c 24 20 e8 4b 83 78 ff ba 01 00 00 00 8b 44 24 20 <3e> 0f b1 55 00 75 19 48 b8 00 00 00 00 00 fc ff df 48 c7 04 03 00
[  299.776718] RSP: 0018:ffffc900000d7170 EFLAGS: 00010297
[  299.779385] RAX: 0000000000000000 RBX: 1ffff9200001ae2e RCX: ffffffffba2487c5
[  299.782335] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffffc900000d7190
[  299.785132] RBP: 0000000000000004 R08: 0000000000000001 R09: 0000000000000003
[  299.787955] R10: fffff5200001ae32 R11: 0000000000000001 R12: 0000000000000000
[  299.790784] R13: 0000000000000004 R14: ffff888102248040 R15: ffff8881063133d8
[  299.793724] FS:  0000000000000000(0000) GS:ffff88811b600000(0000) knlGS:0000000000000000
[  299.796642] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  299.799337] CR2: 0000000000000004 CR3: 000000011690c006 CR4: 0000000000370ef0
[  299.802180] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  299.805062] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  299.807849] Call Trace:
[  299.810107]  ? _raw_read_lock_irq+0x30/0x30
[  299.812611]  frang_tls_handler+0x4f/0xf0 [tempesta_fw]
[  299.815265]  ttls_handshake_server_step+0x1ec/0x820 [tempesta_tls]
[  299.818027]  ? ttls_handshake_server_hello+0x670/0x670 [tempesta_tls]
[  299.820793]  ? __inc_numa_state+0x14/0x90
[  299.823211]  ? __alloc_pages_slowpath.constprop.0+0xc70/0xc70
[  299.826002]  ? free_unref_page_commit+0x109/0x190
[  299.828587]  ? ttls_parse_record_hdr+0x292/0x570 [tempesta_tls]
[  299.831350]  ttls_recv+0x2fa/0x5f0 [tempesta_tls]
[  299.833983]  ss_skb_process+0xfa/0x270 [tempesta_fw]
[  299.836795]  ? ttls_handle_alert+0x50/0x50 [tempesta_tls]
[  299.839638]  ? tfw_sock_srv_exit+0x30/0x30 [tempesta_fw]
[  299.842431]  ? _raw_spin_lock+0x65/0xa0
[  299.845088]  ? _raw_read_lock_irq+0x30/0x30
[  299.847645]  tfw_tls_connection_recv+0x125/0x600 [tempesta_fw]
[  299.850580]  ? tfw_tls_connection_lost+0x30/0x30 [tempesta_fw]
[  299.853320]  ss_tcp_process_skb.constprop.0+0x280/0x470 [tempesta_fw]
[  299.856412]  ? ss_do_close+0x7b0/0x7b0 [tempesta_fw]
[  299.859109]  ss_tcp_process_data+0x1eb/0x2b0 [tempesta_fw]
[  299.861962]  ? ss_tcp_process_skb.constprop.0+0x470/0x470 [tempesta_fw]
[  299.864959]  ss_tcp_data_ready+0x79/0xe0 [tempesta_fw]
[  299.867760]  tcp_rcv_established+0x933/0xb40
[  299.870502]  ? tcp_data_queue+0xa40/0xa40
[  299.873043]  ? __bpf_redirect+0x180/0x1c0
[  299.875557]  tcp_v4_do_rcv+0x24a/0x340
[  299.878168]  tcp_v4_rcv+0x137b/0x1490
[  299.880687]  ? __inet_lookup_established+0x232/0x2c0
[  299.883332]  ? tcp_v4_early_demux+0x2d0/0x2d0
[  299.885890]  ip_protocol_deliver_rcu+0x27/0x260
[  299.888552]  ip_local_deliver_finish+0xac/0xc0
[  299.891141]  ip_local_deliver+0xe9/0x1a0
[  299.893629]  ? ip_local_deliver_finish+0xc0/0xc0
[  299.896126]  ? ip_rcv_finish_core.constprop.0+0x166/0x570
[  299.898797]  ip_rcv+0x12e/0x140
[  299.901037]  ? ip_local_deliver+0x1a0/0x1a0
[  299.903466]  ? receive_buf+0x4f0/0x4f0 [virtio_net]
[  299.905899]  ? virtqueue_get_buf_ctx_split+0x5d/0x1d0 [virtio_ring]
[  299.908637]  ? ip_rcv_finish_core.constprop.0+0x570/0x570
[  299.911147]  ? ip_local_deliver+0x1a0/0x1a0
[  299.913431]  __netif_receive_skb_one_core+0xb9/0xf0
[  299.915897]  ? napi_gro_receive+0x250/0x250
[  299.918311]  ? _raw_spin_lock+0x65/0xa0
[  299.920562]  ? _raw_read_lock_irq+0x30/0x30
[  299.922901]  ? virtnet_poll+0x201/0x30f [virtio_net]
[  299.925218]  process_backlog+0xe3/0x270
[  299.927385]  napi_poll+0xfe/0x2f0
[  299.929731]  net_rx_action+0x14f/0x300
[  299.931934]  ? napi_poll+0x2f0/0x2f0
[  299.934133]  ? __kernel_fpu_begin_mask+0x7c/0x120
[  299.936475]  __do_softirq+0xe9/0x368
[  299.938647]  ? takeover_tasklets+0x2b0/0x2b0
[  299.940934]  run_ksoftirqd+0x15/0x20
[  299.943078]  smpboot_thread_fn+0x194/0x2c0
[  299.945195]  ? smpboot_register_percpu_thread+0x180/0x180
[  299.947499]  ? smpboot_register_percpu_thread+0x180/0x180
[  299.949838]  kthread+0x1c6/0x1f0
[  299.951914]  ? kthread_create_worker_on_cpu+0xa0/0xa0
[  299.954178]  ret_from_fork+0x1f/0x30
[  299.956244] Modules linked in: tempesta_fw(O) tempesta_db(O) tempesta_tls(O) tempesta_lib(O) intel_rapl_msr ppdev intel_rapl_common rapl sg serio_raw parport_pc parport button loop sch_fq_codel ipmi_devintf ipmi_msghandler msr configfs ip_tables x_tables ext4 crc16 mbcache jbd2 efivars linear md_mod virtio_net net_failover failover virtio_blk sr_mod cdrom ata_generic bochs_drm drm_vram_helper drm_ttm_helper ttm drm_kms_helper syscopyarea ata_piix sysfillrect sysimgblt fb_sys_fops libata drm crct10dif_pclmul virtio_pci psmouse virtio_ring virtio scsi_mod i2c_piix4 [last unloaded: tempesta_lib]
[  299.971991] CR2: 0000000000000004
[  299.974368] ---[ end trace 6947d81dcc0a1ea5 ]---
[  299.976900] RIP: 0010:_raw_spin_lock+0x6e/0xa0
[  299.979495] Code: be 04 00 00 00 c7 44 24 20 00 00 00 00 e8 5a 83 78 ff be 04 00 00 00 48 8d 7c 24 20 e8 4b 83 78 ff ba 01 00 00 00 8b 44 24 20 <3e> 0f b1 55 00 75 19 48 b8 00 00 00 00 00 fc ff df 48 c7 04 03 00
[  299.986061] RSP: 0018:ffffc900000d7170 EFLAGS: 00010297
[  299.988750] RAX: 0000000000000000 RBX: 1ffff9200001ae2e RCX: ffffffffba2487c5
[  299.991695] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffffc900000d7190
[  299.994830] RBP: 0000000000000004 R08: 0000000000000001 R09: 0000000000000003
[  299.997806] R10: fffff5200001ae32 R11: 0000000000000001 R12: 0000000000000000
[  300.000722] R13: 0000000000000004 R14: ffff888102248040 R15: ffff8881063133d8
[  300.003817] FS:  0000000000000000(0000) GS:ffff88811b600000(0000) knlGS:0000000000000000
[  300.007206] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  300.010063] CR2: 0000000000000004 CR3: 000000011690c006 CR4: 0000000000370ef0
[  300.013066] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  300.016161] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  300.019112] Kernel panic - not syncing: Fatal exception in interrupt

Apparently frang_tls_handler() is called on not yet initialized sk->sk_security. frang_conn_new() is called under the socket lock (see the upcoming patch with the spin lock assertion), so there could be no race and it seems frang_conn_new() wasn't called at all. I still didn't figure out whether this is a problem of this particular branch of the master - I saw the crash only once.

krizhanovsky avatar Jul 14 '22 00:07 krizhanovsky

Caught crash, which is fixed in https://github.com/tempesta-tech/tempesta/pull/1643/commits/70daa648eff757b419e8bcc2643180691172de64

[21157.520413] kernel BUG at /root/tempesta/fw/t/unit/../../str.c:564!
[21157.523559] invalid opcode: 0000 [#1] SMP KASAN PTI
[21157.527195] CPU: 2 PID: 50116 Comm: python3 Tainted: G    B      O      5.10.35-kdump+ #16
[21157.527613] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[21157.533690] RIP: 0010:tfw_str_collect_cmp+0x7a/0x170 [tempesta_fw]
[21157.537676] Code: 48 8b 03 48 89 45 00 48 8b 43 08 48 89 45 08 48 8b 43 10 48 89 45 10 48 8b 43 18 48 89 45 18 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <0f> 0b 5b 48 89 ef be 20 00 00 00 5d 41 5c 41 5d 41 5e 41 5f e9 5d
[21157.544242] RSP: 0018:ffffc900001c8388 EFLAGS: 00010206
[21157.547849] RAX: 0000000000000000 RBX: ffff8880b7ae6630 RCX: dffffc0000000000
[21157.547849] RDX: 0000000000000003 RSI: ffff8880b7ae6630 RDI: ffff8880b7ae6648
[21157.553778] RBP: ffffc900001c84e8 R08: ffffffffc0aa69bf R09: 0000000000000000
[21157.558393] R10: ffffed1016f5cc7d R11: 0000000000000001 R12: ffff8880b7ae6630
[21157.563905] R13: 00000000041fd200 R14: ffffffffc0ac8e80 R15: ffff888104b070e0
[21157.568581] FS:  00007f6090579640(0000) GS:ffff88811b700000(0000) knlGS:0000000000000000
[21157.568581] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[21157.573659] CR2: 00007f322887fac0 CR3: 00000000b7a06001 CR4: 0000000000370ee0
[21157.578910] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[21157.583676] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[21157.583676] Call Trace:
[21157.589066]  <IRQ>
[21157.589066]  tfw_http_search_cookie+0x1ea/0x230 [tempesta_fw]
[21157.594456]  ? tfw_http_verify_hdr_field+0x150/0x150 [tempesta_fw]
[21157.599364]  ? __http_msg_hdr_val+0x19c/0x1d0 [tempesta_fw]
[21157.603770]  match_cookie+0x1a1/0x2e0 [tempesta_fw]
[21157.604633]  ? tfw_http_search_cookie+0x230/0x230 [tempesta_fw]
[21157.609529]  ? tfw_http_req_client_link+0xbc/0x300 [tempesta_fw]
[21157.613743]  ? tfw_h2_append_predefined_body+0x540/0x540 [tempesta_fw]
[21157.613743]  do_eval+0xc1/0x190 [tempesta_fw]
[21157.619785]  tfw_http_match_req+0x38/0x50 [tempesta_fw]
[21157.623980]  tfw_http_tbl_action+0x14c/0x1d0 [tempesta_fw]
[21157.623980]  tfw_http_req_process+0x30c/0x9c0 [tempesta_fw]
[21157.629918]  ? policy_node+0x42/0x60
[21157.633732]  ? tfw_http_req_cache_cb+0x6d0/0x6d0 [tempesta_fw]
[21157.633732]  ? memset+0x20/0x40
[21157.640122]  ? tfw_http_init_parser_req+0x5f/0x70 [tempesta_fw]
[21157.643692]  ? tfw_http_conn_msg_alloc+0x1b4/0x370 [tempesta_fw]
[21157.643692]  ? tfw_http_msg_process_generic+0xc3/0x260 [tempesta_fw]
[21157.650332]  tfw_connection_recv+0x91/0xe0 [tempesta_fw]
[21157.653717]  ss_tcp_process_skb.constprop.0+0x280/0x470 [tempesta_fw]
[21157.653717]  ? ss_conn_drop_guard_exit+0x70/0x70 [tempesta_fw]
[21157.660571]  ss_tcp_process_data+0x1ef/0x2a0 [tempesta_fw]
[21157.663631]  ? ss_tcp_process_skb.constprop.0+0x470/0x470 [tempesta_fw]
[21157.663631]  ? tcp_event_data_recv+0x292/0x5d0
[21157.670709]  ss_tcp_data_ready+0x68/0x100 [tempesta_fw]
[21157.673929]  tcp_data_queue+0x5d8/0xa40
[21157.673929]  ? tcp_data_ready+0x1e0/0x1e0
[21157.680937]  ? ktime_get+0x40/0xa0
[21157.683666]  tcp_rcv_established+0x3ea/0xb40
[21157.683666]  ? tcp_data_queue+0xa40/0xa40

krizhanovsky avatar Aug 16 '22 22:08 krizhanovsky

BTW, there are some other cases of freeing the list member without explicit list_del()/list_del_init() in sock_clnt.c, which might cause problems.

s0nx avatar Sep 01 '22 09:09 s0nx

Just caught a crash on the current version of the branch on multiple_listeners.test_on_the_fly.TestOnTheFly.test_change_config_on_the_fly test from https://github.com/tempesta-tech/tempesta-test/pull/275:

[696585.558773] BUG: KASAN: use-after-free in tfw_sock_clnt_new+0x21f/0x460 [tempesta_fw]
[696585.561162] Read of size 24 at addr ffff88811af2f600 by task wrk/76566
[696585.563332]
[696585.564933] CPU: 3 PID: 76566 Comm: wrk Tainted: G           O      5.10.35-kdump+ #16
[696585.565344] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[696585.565344] Call Trace:
[696585.565344]  <IRQ>
[696585.565344]  dump_stack+0x7d/0xa3
[696585.565344]  print_address_description.constprop.0+0x1c/0x1f0
[696585.565344]  ? _raw_spin_lock_irqsave+0x6b/0xb0
[696585.565344]  ? _raw_write_lock_irqsave+0xb0/0xb0
[696585.565344]  ? tfw_sock_clnt_new+0x21f/0x460 [tempesta_fw]
[696585.565344]  kasan_report.cold+0x1f/0x37
[696585.565344]  ? tfw_sock_clnt_new+0x21f/0x460 [tempesta_fw]
[696585.565344]  check_memory_region+0x142/0x190
[696585.565344]  tfw_sock_clnt_new+0x21f/0x460 [tempesta_fw]
[696585.565344]  ? tfw_cli_conn_release+0xb0/0xb0 [tempesta_fw]
[696585.565344]  ss_tcp_state_change+0x137/0x320 [tempesta_fw]
[696585.565344]  tcp_rcv_state_process+0x6b7/0xc10
[696585.565344]  ? tcp_rcv_synsent_state_process+0xd20/0xd20
[696585.565344]  ? tcp_ca_openreq_child+0x1c0/0x1c0
[696585.565344]  ? bpf_get_listener_sock+0x31/0x80
[696585.565344]  tcp_child_process+0x14f/0x2d0
[696585.565344]  tcp_v4_rcv+0x100e/0x1490
.....
[696585.565344] Allocated by task 76532:
[696585.565344]  kasan_save_stack+0x1b/0x40
[696585.565344]  __kasan_kmalloc.constprop.0+0xbf/0xd0
[696585.565344]  kmem_cache_alloc_trace+0x220/0x3b0
[696585.565344]  tfw_listen_sock_add+0x179/0x240 [tempesta_fw]
[696585.565344]  tfw_cfgop_listen+0x163/0x240 [tempesta_fw]
[696585.565344]  spec_handle_entry+0x80/0xb0 [tempesta_fw]
[696585.565344]  tfw_cfg_parse_mods+0x24a/0x2d0 [tempesta_fw]
[696585.565344]  tfw_cfg_parse+0x73/0xb0 [tempesta_fw]
[696585.565344]  tfw_start+0x5c/0x190 [tempesta_fw]
[696585.565344]  tfw_ctlfn_state_io+0x18b/0x1c0 [tempesta_fw]
[696585.565344]  proc_sys_call_handler+0x1c3/0x300
[696585.565344]  new_sync_write+0x22b/0x320
[696585.565344]  vfs_write+0x2ae/0x350
[696585.565344]  ksys_write+0xa0/0x120
[696585.565344]  do_syscall_64+0x2d/0x70
[696585.565344]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[696585.565344]
[696585.565344] Freed by task 76583:
[696585.565344]  kasan_save_stack+0x1b/0x40
[696585.565344]  kasan_set_track+0x1c/0x30
[696585.565344]  kasan_set_free_info+0x1b/0x30
[696585.565344]  __kasan_slab_free+0xec/0x130
[696585.565344]  kfree+0x8c/0x110
[696585.565344]  tfw_sock_clnt_start+0x407/0x5a0 [tempesta_fw]
[696585.565344]  tfw_start+0xd3/0x190 [tempesta_fw]
[696585.565344]  tfw_ctlfn_state_io+0x18b/0x1c0 [tempesta_fw]
[696585.565344]  proc_sys_call_handler+0x1c3/0x300
[696585.565344]  new_sync_write+0x22b/0x320
[696585.565344]  vfs_write+0x2ae/0x350
[696585.565344]  ksys_write+0xa0/0x120
[696585.565344]  do_syscall_64+0x2d/0x70
[696585.565344]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[696585.565344]
[696585.565344] The buggy address belongs to the object at ffff88811af2f600
[696585.565344]  which belongs to the cache kmalloc-96 of size 96
[696585.565344] The buggy address is located 0 bytes inside of
[696585.565344]  96-byte region [ffff88811af2f600, ffff88811af2f660)
[696585.565344] The buggy address belongs to the page:
[696585.565344] page:0000000084c5d3fb refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11af2f
[696585.565344] flags: 0x2fff80000000200(slab)
[696585.565344] raw: 02fff80000000200 ffffea00040c4488 ffffea000412eec8 ffff888100040600
[696585.565344] raw: 0000000000000000 ffff88811af2f000 0000000100000020 ffff88810096a001
[696585.565344] page dumped because: kasan: bad access detected
[696585.565344] page->mem_cgroup:ffff88810096a001

The problem is caused by the race between tfw_sock_clnt_new() callback and tfw_sock_clnt_start(): while a new child socket is created and initialized by the listening socket, the listening socket is freed during the reconfiguration.

Fixed in https://github.com/tempesta-tech/tempesta/pull/1643/commits/e8cfb48efb3a17aa87b7f25c6bc83d62fa51f5a8

krizhanovsky avatar Sep 03 '22 12:09 krizhanovsky

The tests are broken. On CI I http://tempesta-vm.cloud.cherryservers.net:8080/job/tempesta-test%20PR/211/console do see these tests failed:

cache.test_purge_resp_hch.HeavyChunkedPurgeRespTest
tls.test_tls_tickets.TlsTicketTest

The both pass in my VM with Ubuntu 22 and tests master at 8efb2353a2b89a542de4e1783f9913fc32b924a7 . The last test throws though this warning:

/root/tempesta-test/tls/scapy_ssl_tls/ssl_tls_crypto.py:1093: UserWarning: Verification of GCM tag failed: MAC check failed
  warnings.warn("Verification of GCM tag failed: %s" % why)=

However, on my system these tests fail:

http_rules.test_http_tables.HttpTablesTestMarkRules
tls.test_tls_cert.TlsCertSelect
ws.test_ws_ping.WssPingProxy
ws.test_ws_ping.WssStress
test_limited_ka in regression.test_stress_failovering.HashFailovering
test_unlimited_ka in regression.test_stress_failovering.HashFailovering
test_limited_ka in regression.test_stress_failovering.RatioFailovering
test_unlimited_ka in regression.test_stress_failovering.RatioFailovering
regression.test_stress_pipeline.Pipeline
----------------------------------------------------------------------
Ran 355 tests in 2013.741s

FAILED (failures=5, errors=4, skipped=4)

The first test is about https://github.com/tempesta-tech/tempesta-test/issues/285 , the TLS one is https://github.com/tempesta-tech/tempesta-test/issues/302 .

I reran the failed CI tests on my VM and they do pass:

# ./run_tests.py -n cache.test_purge_resp_hch.HeavyChunkedPurgeRespTest tls.test_tls_tickets.TlsTicketTest
....
----------------------------------------------------------------------
Running functional tests...
----------------------------------------------------------------------

test_0_purge_resp_non_hch (cache.test_purge_resp_hch.HeavyChunkedPurgeRespTest) ... ok
test_1_purge_resp_hch (cache.test_purge_resp_hch.HeavyChunkedPurgeRespTest) ... ok
test_invalid_ticket (tls.test_tls_tickets.TlsTicketTest) ... /root/tempesta-test/tls/scapy_ssl_tls/ssl_tls_crypto.py:1093: UserWarning: Verification of GCM tag failed: MAC check failed
  warnings.warn("Verification of GCM tag failed: %s" % why)
ok
test_no_ticket_support (tls.test_tls_tickets.TlsTicketTest) ... ok

----------------------------------------------------------------------
Ran 4 tests in 43.270s

OK

krizhanovsky avatar Sep 04 '22 20:09 krizhanovsky