kernel panic error caused by a bug in the “val_to_ring” function,causing a crash of the host machine
Describe the bug
[1047486.856617] falco: deallocating consumer ffff9aba94a2a0e0
[1047486.938918] BUG: unable to handle kernel paging request at ffffac4fe972383e
[1047486.943701] falco: no more consumers, stopping capture
[1047486.943583] IP: [<ffffffffc0b6dd70>] val_to_ring+0x80/0x460 [falco]
[1047486.950758] PGD 179982067 PUD 179983067 PMD 4f0fee067 PTE 0
[1047486.955568] Oops: 0002 [#1] SMP
[1047486.973052] Modules linked in: udp_diag binfmt_misc falco(OE) veth ipt_rpfilter vxlan ip6_udp_tunnel udp_tunnel xt_set xt_multiport ip_set_hash_ip ip_set_hash_net ip_set ipip tunnel4 ip_tunnel ip6t_MASQUERADE nf_nat_masquerade_ipv6 xt_statistic xt_nat ipt_REJECT nf_reject_ipv4 ip6table_filter ip6table_mangle ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs nf_tables iptable_raw xt_CT dummy rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache ip6table_nat ip6_tables iptable_mangle xt_comment xt_mark tcp_diag inet_diag xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat br_netfilter bridge stp llc overlay(T) openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack
[1047487.021023] ppdev cirrus ttm iosf_mbi drm_kms_helper crc32_pclmul syscopyarea sysfillrect sysimgblt ghash_clmulni_intel fb_sys_fops drm aesni_intel joydev lrw gf128mul drm_panel_orientation_quirks glue_helper ablk_helper virtio_balloon i2c_piix4 parport_pc cryptd parport pcspkr drbd_transport_tcp(OE) drbd(OE) ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net virtio_blk scsi_transport_iscsi ata_piix libata crct10dif_pclmul crct10dif_common virtio_pci virtio_ring crc32c_intel serio_raw virtio floppy sunrpc dm_mirror dm_region_hash dm_log dm_mod
[1047487.052673] CPU: 4 PID: 55365 Comm: find Kdump: loaded Tainted: G OE ------------ T 3.10.0-1062.9.1.el7.x86_64 #1
[1047487.073362] Hardware name: RDO OpenStack Compute, BIOS 1.11.0-2.el7 04/01/2014
[1047487.077771] task: ffff9ab58f82d230 ti: ffff9ab949200000 task.ti: ffff9ab949200000
[1047487.082308] RIP: 0010:[<ffffffffc0b6dd70>] [<ffffffffc0b6dd70>] val_to_ring+0x80/0x460 [falco]
[1047487.087522] RSP: 0018:ffff9ab949203b80 EFLAGS: 00010287
[1047487.091793] RAX: 000000000000001e RBX: ffff9ab949203d98 RCX: 0000000000000000
[1047487.096184] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffac4fe9723818
[1047487.100793] RBP: ffff9ab949203bc0 R08: 0000000000000000 R09: 0000000000000098
[1047487.105080] R10: 0000000000000001 R11: 0000000000000246 R12: 000000000000fde8
[1047487.109184] R13: 0000000000000000 R14: 0000000000000001 R15: ffffac4fe972383e
[1047487.113464] FS: 0000000000000000(0000) GS:ffff9abb3fd00000(0000) knlGS:0000000000000000
[1047487.118294] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1047487.122034] CR2: ffffac4fe972383e CR3: 000000054ee06000 CR4: 00000000001606e0
[1047487.127158] Call Trace:
[1047487.129721] [<ffffffffa2104262>] ? security_inode_permission+0x22/0x30
[1047487.134438] [<ffffffffa20565b2>] ? __inode_permission+0x52/0xd0
[1047487.138980] [<ffffffffc0b7058d>] f_proc_startupdate+0x77d/0x1250 [falco]
[1047487.144099] [<ffffffffa2588a26>] ? trace_do_page_fault+0x56/0x150
[1047487.149122] [<ffffffffc0b6b576>] record_event_consumer+0x4b6/0xdf0 [falco]
[1047487.154468] [<ffffffffa204737a>] ? __check_object_size+0x1ca/0x250
[1047487.173196] [<ffffffffa25789b1>] ? create_elf_tables+0x542/0x56d
[1047487.177178] [<ffffffffc0b6bf24>] record_event_all_consumers+0x74/0xb0 [falco]
[1047487.181531] [<ffffffffc0b6c27d>] syscall_exit_probe+0xed/0x120 [falco]
[1047487.185909] [<ffffffffa1e3c22d>] syscall_trace_leave+0xfd/0x110
[1047487.189593] [<ffffffffa258e220>] int_check_syscall_exit_work+0x13/0x1c
[1047487.193456] Code: 46 e2 8b 53 34 48 c1 e0 06 4c 29 c8 49 89 f6 48 69 d2 30 07 00 00 48 8d 94 10 40 1a b8 c0 8b 42 50 83 f8 1b 74 25 31 d2 83 f8 2e <66> 41 89 17 0f 87 8c 02 00 00 48 8b 04 c5 10 16 b8 c0 e9 89 4d
[1047487.207611] RIP [<ffffffffc0b6dd70>] val_to_ring+0x80/0x460 [falco]
[1047487.213099] RSP <ffff9ab949203b80>
[1047487.216526] CR2: ffffac4fe972383e
How to reproduce it Repeatedly reload the Falco process(send SIGHUP signal)
There is no stable reproduction method, but based on the dmesg information, the anomaly occurred right at the attempting a second restart for capture.
[744044.448687] falco: initializing ring buffer for CPU 0
[744044.650350] falco: CPU buffer initialized, size=134217728
[744044.664054] falco: initializing ring buffer for CPU 1
[744045.008622] falco: CPU buffer initialized, size=134217728
[744045.021499] falco: initializing ring buffer for CPU 2
[744045.208599] falco: CPU buffer initialized, size=134217728
[744045.225987] falco: initializing ring buffer for CPU 3
[744045.408549] falco: CPU buffer initialized, size=134217728
[744045.421837] falco: initializing ring buffer for CPU 4
[744045.646908] falco: CPU buffer initialized, size=134217728
[744045.659648] falco: initializing ring buffer for CPU 5
[744045.798287] falco: CPU buffer initialized, size=134217728
[744045.810377] falco: initializing ring buffer for CPU 6
[744046.151417] falco: CPU buffer initialized, size=134217728
[744046.162903] falco: initializing ring buffer for CPU 7
[744046.304424] falco: CPU buffer initialized, size=134217728
[744046.316198] falco: starting capture
[744398.712038] falco: deallocating consumer ffff9ab9fe618000
[744398.788622] falco: no more consumers, stopping capture
[744399.940525] falco: adding new consumer ffff9ab9fe618000
[744399.999193] falco: initializing ring buffer for CPU 0
[744400.199128] falco: CPU buffer initialized, size=134217728
[744400.211459] falco: initializing ring buffer for CPU 1
[744400.599133] falco: CPU buffer initialized, size=134217728
[744400.619842] falco: initializing ring buffer for CPU 2
[744400.899144] falco: CPU buffer initialized, size=134217728
[744400.913185] falco: initializing ring buffer for CPU 3
[744401.299113] falco: CPU buffer initialized, size=134217728
[744401.315185] falco: initializing ring buffer for CPU 4
[744401.599137] falco: CPU buffer initialized, size=134217728
[744401.611852] falco: initializing ring buffer for CPU 5
[744401.899065] falco: CPU buffer initialized, size=134217728
[744401.912112] falco: initializing ring buffer for CPU 6
[744402.159074] falco: CPU buffer initialized, size=134217728
[744402.176069] falco: initializing ring buffer for CPU 7
[744402.599085] falco: CPU buffer initialized, size=134217728
[744402.614843] falco: starting capture
[744606.011475] falco: deallocating consumer ffff9ab9fe618000
[744606.128370] falco: no more consumers, stopping capture
[744607.334996] falco: adding new consumer ffff9ab9fe618000
[744607.393689] falco: initializing ring buffer for CPU 0
[744607.593637] falco: CPU buffer initialized, size=134217728
[744607.613109] falco: initializing ring buffer for CPU 1
[744607.893716] falco: CPU buffer initialized, size=134217728
[744607.907875] falco: initializing ring buffer for CPU 2
[744608.293588] falco: CPU buffer initialized, size=134217728
[744608.307129] falco: initializing ring buffer for CPU 3
[744608.593622] falco: CPU buffer initialized, size=134217728
[744608.606678] falco: initializing ring buffer for CPU 4
[744608.793564] falco: CPU buffer initialized, size=134217728
[744608.810289] falco: initializing ring buffer for CPU 5
[744609.093557] falco: CPU buffer initialized, size=134217728
[744609.112077] falco: initializing ring buffer for CPU 6
[744609.397101] falco: CPU buffer initialized, size=134217728
[744609.414630] falco: initializing ring buffer for CPU 7
[744610.093546] falco: CPU buffer initialized, size=134217728
[744610.105775] falco: starting capture
[744613.235171] falco[977062]: segfault at 14 ip 0000000000dec8b0 sp 00007ffe8256f8d8 error 4 in falco[400000+f7e000]
.
.
.
[748207.518699] falco: initializing ring buffer for CPU 7
[748207.797835] falco: CPU buffer initialized, size=134217728
[748207.813321] falco: starting capture
[748211.141318] falco[1520458]: segfault at 14 ip 0000000000dec8b0 sp 00007ffcef650028 error 4 in falco[400000+f7e000]
.
.
.
[755863.333935] falco: CPU buffer initialized, size=134217728
[755863.348515] falco: initializing ring buffer for CPU 7
[755863.490289] falco: CPU buffer initialized, size=134217728
[755863.504481] falco: starting capture
[755866.915734] falco[1838351]: segfault at 14 ip 0000000000dec8b0 sp 00007ffcfddaff18 error 4 in falco[400000+f7e000]
[803688.923205] falco: CPU buffer initialized, size=134217728
[803688.926634] falco: initializing ring buffer for CPU 7
[803689.123176] falco: CPU buffer initialized, size=134217728
[803689.139546] falco: starting capture
[803692.449034] traps: falco[3458365] general protection ip:dec8b0 sp:7ffe6d4c4958 error:0 in falco[400000+f7e000]
Expected behaviour
Screenshots
Environment
kerenl module
Linux ecs-sit-0002 3.10.0-1062.9.1.el7.x86_64 #1 SMP Fri Dec 6 15:49:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
- Falco version:
falco version=0.33.0-22+c353bb4, driver version=3.0.1+driver, arch=x86_64, kernel release=3.10.0-1062.9.1.el7.x86_64, kernel version=1
- System info:
root@ecs-sit-0002:/# /usr/bin/falco --support
Wed Sep 20 04:04:38 2023: Falco version 0.33.0-22+c353bb4 (x86_64)
Wed Sep 20 04:04:38 2023: Falco initialized with configuration file /etc/holmes/holmes.yaml
Segmentation fault (core dumped)
- Cloud provider or hardware configuration:
- OS:
[root@ecs-sit-0002 ~]# cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
- Kernel:
Linux ecs-sit-0002 3.10.0-1062.9.1.el7.x86_64 #1 SMP Fri Dec 6 15:49:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
- Installation method:
Kubernetes
Additional context
ei @Spartan-65 I'm sorry for that! Do you mind testing the latest Falco version https://github.com/falcosecurity/falco/releases/tag/0.35.1? Just to see if the issue is still here
ei @Spartan-65 I'm sorry for that! Do you mind testing the latest Falco version https://github.com/falcosecurity/falco/releases/tag/0.35.1? Just to see if the issue is still here
Sorry, operations engineers are not allowed to redeploy Falco to this environment until we identify the root cause of the issue.
ok makes sense, don't worry!
Repeatedly reload the Falco process(send SIGHUP signal) There is no stable reproduction method, but based on the dmesg information, the anomaly occurred right at the attempting a second restart for capture.
We will try to reproduce the issue using the repro you suggested
We weren't able to repro this :/ moving to next milestone. Hopefully we will be able to tackle this one. /milestone 0.17.0
/milestone 0.18.0 We still had no luck in reproducing this.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Provide feedback via https://github.com/falcosecurity/community. /close
@poiana: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue with
/reopen.Mark the issue as fresh with
/remove-lifecycle rotten.Provide feedback via https://github.com/falcosecurity/community. /close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.