lkrg icon indicating copy to clipboard operation
lkrg copied to clipboard

Get a "Stack pointer corruption" when using LKRG on a system with nodejs

Open gnd opened this issue 3 years ago • 7 comments

Hello, we have recently added lkrg to the mix on one of our machines and it seems like there might be a problem. Every now and then i see this in dmesg:

[Wed Jan 13 04:53:45 2021] [p_lkrg] <Exploit Detection> Not valid call - pCFI violation: process[write_gcm GSD | 21281] !!! [Wed Jan 13 04:53:45 2021] [p_lkrg] <Exploit Detection> Frame[2] nr_entries[4]: [0x742]. Full Stack below: [Wed Jan 13 04:53:45 2021] [p_lkrg] <Exploit Detection> Trying to kill process[write_gcm GSD | 21281]! [Wed Jan 13 04:53:45 2021] [p_lkrg] <Exploit Detection> Stack pointer corruption (ROP?) - pCFI violation: process[write_gcm GSD | 21281] !!! [Wed Jan 13 04:53:45 2021] [p_lkrg] <Exploit Detection> Trying to kill process[write_gcm GSD | 21281]! [Wed Jan 13 07:29:45 2021] [p_lkrg] <Exploit Detection> Not valid call - pCFI violation: process[write_gcm ATS | 31829] !!! [Wed Jan 13 07:29:45 2021] [p_lkrg] <Exploit Detection> Frame[2] nr_entries[4]: [0x1]. Full Stack below: [Wed Jan 13 07:29:45 2021] [p_lkrg] <Exploit Detection> Trying to kill process[write_gcm ATS | 31829]! [Wed Jan 13 07:29:45 2021] [p_lkrg] <Exploit Detection> Stack pointer corruption (ROP?) - pCFI violation: process[write_gcm ATS | 31829] !!! [Wed Jan 13 07:29:45 2021] [p_lkrg] <Exploit Detection> Trying to kill process[write_gcm ATS | 31829]! [Wed Jan 13 07:45:57 2021] [p_lkrg] <Exploit Detection> Not valid call - pCFI violation: process[node | 926] !!! [Wed Jan 13 07:45:57 2021] [p_lkrg] <Exploit Detection> Frame[2] nr_entries[4]: [0xd6db]. Full Stack below: [Wed Jan 13 07:45:57 2021] [p_lkrg] <Exploit Detection> Trying to kill process[node | 926]! [Wed Jan 13 07:45:57 2021] [p_lkrg] <Exploit Detection> Stack pointer corruption (ROP?) - pCFI violation: process[node | 926] !!! [Wed Jan 13 07:45:57 2021] [p_lkrg] <Exploit Detection> Trying to kill process[node | 926]! [Wed Jan 13 09:35:55 2021] [p_lkrg] <Exploit Detection> Not valid call - pCFI violation: process[node | 11231] !!! [Wed Jan 13 09:35:55 2021] [p_lkrg] <Exploit Detection> Frame[2] nr_entries[4]: [0xc84f]. Full Stack below: [Wed Jan 13 09:35:55 2021] [p_lkrg] <Exploit Detection> Trying to kill process[node | 11231]! [Wed Jan 13 09:35:55 2021] [p_lkrg] <Exploit Detection> Stack pointer corruption (ROP?) - pCFI violation: process[node | 11231] !!! [Wed Jan 13 09:35:55 2021] [p_lkrg] <Exploit Detection> Trying to kill process[node | 11231]! [Wed Jan 13 17:35:53 2021] [p_lkrg] <Exploit Detection> Not valid call - pCFI violation: process[node | 29968] !!! [Wed Jan 13 17:35:53 2021] [p_lkrg] <Exploit Detection> Frame[2] nr_entries[4]: [0x378a]. Full Stack below: [Wed Jan 13 17:35:53 2021] [p_lkrg] <Exploit Detection> Trying to kill process[node | 29968]! [Wed Jan 13 17:35:53 2021] [p_lkrg] <Exploit Detection> Stack pointer corruption (ROP?) - pCFI violation: process[node | 29968] !!! [Wed Jan 13 17:35:53 2021] [p_lkrg] <Exploit Detection> Trying to kill process[node | 29968]!

The system is a standard Debian Stretch (9.13) and 4.9.0-12-amd64 kernel. I see some issues are triggered by nodejs but not only.

Is there any way how to get rid of these problems ?

gnd avatar Jan 13 '21 17:01 gnd

Hi @gnd. Thank you for reporting this. What version of LKRG is this with? If it's anything other than the latest from this repo, then please upgrade and try again. If it is the latest, then please state so and we'll look into the issue. Thanks!

solardiz avatar Jan 13 '21 18:01 solardiz

Hi, it was an older build. I have recompiled with the latest master and still get the same issue:

[9178385.995027] [p_lkrg] LKRG initialized successfully! [9178386.000400] Restarting tasks ... done. [9183968.645736] [p_lkrg] <Exploit Detection> Not valid call - pCFI violation: process[write_gcm ATS | 8054] !!! [9183968.657254] [p_lkrg] <Exploit Detection> Frame[2] nr_entries[4]: [0x1]. Full Stack below: [9183968.665718] --- . --- [9183968.668284] schedule+0x1/0x80 [9183968.671728] call_rwsem_down_read_failed+0x14/0x30 [9183968.678238] 0x1 [9183968.680405] 0xffffffff [9183968.683136] --- END --- [9183968.687376] [p_lkrg] <Exploit Detection> Trying to kill process[write_gcm ATS | 8054]! [9183968.695628] [p_lkrg] <Exploit Detection> Stack pointer corruption (ROP?) - pCFI violation: process[write_gcm ATS | 8054] !!! [9183968.708651] [p_lkrg] <Exploit Detection> Trying to kill process[write_gcm ATS | 8054]! [9184169.987812] [p_lkrg] <Exploit Detection> Not valid call - pCFI violation: process[node | 8357] !!! [9184169.997236] [p_lkrg] <Exploit Detection> Frame[2] nr_entries[4]: [0x163c]. Full Stack below: [9184170.005971] --- . --- [9184170.008550] schedule+0x1/0x80 [9184170.011908] call_rwsem_down_read_failed+0x14/0x30 [9184170.016980] 0x163c [9184170.019358] 0x10000 [9184170.021821] --- END --- [9184170.024546] [p_lkrg] <Exploit Detection> Trying to kill process[node | 8357]! [9184170.031996] [p_lkrg] <Exploit Detection> Stack pointer corruption (ROP?) - pCFI violation: process[node | 8357] !!! [9184170.042747] [p_lkrg] <Exploit Detection> Trying to kill process[node | 8357]!

gnd avatar Jan 13 '21 22:01 gnd

The system is a standard Debian Stretch (9.13) and 4.9.0-12-amd64 kernel.

This kernel is a binary build that came with Debian, right? Or did you rebuild?

solardiz avatar Jan 13 '21 22:01 solardiz

Would you be able to also verify if your kernel is compiled with CONFIG_UNWINDER_ORC? Can you confirm that you are not running LKRG on VirtualBox host machine where you run guest VMs? I would be also thankful if you could tell me how I can repro the same issue as you can see. What is the nodejs configuration (i've never used it so I don't have any knowledge about it), what else is needed, etc. I've done basic tests on Debian 9 (with kernel 4.9.0-9-amd64 #1 SMP Debian 4.9.168-1+deb9u2) using basic nodejs app and I don't see any issues:

$ cat test/index.js 
const express = require('express')
const app = express()
const port = 3000

app.get('/', (req, res) => {
  res.send('Hello World!')
})

app.listen(port, () => {
  console.log(`Example app listening at http://localhost:${port}`)
})

It might be related to the kernel config itself (and maybe non standard kernel modules?) and app itself.

Btw. Just FYI that you can turn off temporarily pCFI feature (until we investigate this issue). You can do it via sysctl interface e.g.: # sysctl lkrg.pcfi_validate=0 You can also try 'weak' pCFI validation via: # sysctl lkrg.pcfi_validate=1

Adam-pi3 avatar Jan 14 '21 01:01 Adam-pi3

Hi,

the kernel came with Debian, and has not been rebuilt. I dont see the CONFIG_UNWINDER_ORC in the kernel config:

$ sudo grep CONFIG_UNWINDER_ORC /boot/config-4.9.0-12-amd64 $

The machine is a GCP instance. LKRG runs fine elsewhere on GCP on Deb 10 VMs. Unfortunately I can't share more info about the Nodejs apps because they are proprietary. One notable thing might be that the node apps use a lot of RAM (~20GB) shuffling a lot of data around.

Thanks for the pCFI hint, I will turn it off and let you know if that helped. Since it's hard to replicate this issue, and since I suspect this might be an older kernel, than one can get on Debian 9. I suggest we wait for a scheduled reboot (over the weekend) to see if a newer kernel would solve it. If you have some tests you need me to run in the meantime, I will be happy to help. Thanks a lot for your help !

gnd avatar Jan 14 '21 09:01 gnd

@gnd I wonder if we should close this issue, any updates? @solardiz what do you think?

Adam-pi3 avatar Feb 22 '22 00:02 Adam-pi3

@Adam-pi3 Let's wait to hear from @gnd, but yes - without this issue having recently been reproduced by anyone, it doesn't look actionable for us.

solardiz avatar Feb 22 '22 18:02 solardiz