WSL icon indicating copy to clipboard operation
WSL copied to clipboard

Problems generating corefiles with WSL2

Open paul-haskell opened this issue 1 year ago • 35 comments

Discussed in https://github.com/microsoft/WSL/discussions/11992

Originally posted by paul-haskell September 3, 2024 Hi all -- I am trying to generate a corefile under WSL2.

  1. I disabled apport (sudo service apport stop)
  2. I set the kernel.core_pattern appropriately (sudo sysctl kernel.core_pattern=core.%e.%p)
  3. I set the corefilesize limit to 'unlimited'
  4. I verified the current directory is writable by all
  5. I ran a few simple programs that should throw a core (abort(), integer divide-by-0)

...but I never get a core. I do reliably get a corefile on a native Ubuntu machine.

Does anyone have any ideas to try? Thanks!

(I am running WSL version 2.2.4.0 with default Ubuntu i.e. 22.04.3 LTS. I am running Windows version 10.0.22631.4037 .)

paul-haskell avatar Sep 05 '24 20:09 paul-haskell

Logs are required for review from WSL team

If this a feature request, please reply with '/feature'. If this is a question, reply with '/question'. Otherwise please attach logs by following the instructions below, your issue will not be reviewed unless they are added. These logs will help us understand what is going on in your machine.

How to collect WSL logs

Download and execute collect-wsl-logs.ps1 in an administrative powershell prompt:

Invoke-WebRequest -UseBasicParsing "https://raw.githubusercontent.com/microsoft/WSL/master/diagnostics/collect-wsl-logs.ps1" -OutFile collect-wsl-logs.ps1
Set-ExecutionPolicy Bypass -Scope Process -Force
.\collect-wsl-logs.ps1

The script will output the path of the log file once done.

If this is a networking issue, please use collect-networking-logs.ps1, following the instructions here

Once completed please upload the output files to this Github issue.

Click here for more info on logging If you choose to email these logs instead of attaching to the bug, please send them to [email protected] with the number of the github issue in the subject, and in the message a link to your comment in the github issue and reply with '/emailed-logs'.

View similar issues

Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it!

Closed similar issues:

Note: You can give me feedback by thumbs upping or thumbs downing this comment.

github-actions[bot] avatar Sep 05 '24 20:09 github-actions[bot]

I already tried the fixes in #1754.

paul-haskell avatar Sep 05 '24 21:09 paul-haskell

Diagnostic information
Detected appx version: 2.2.4.0

github-actions[bot] avatar Sep 05 '24 21:09 github-actions[bot]

@paul-haskell is systemd-coredump installed? Your coredumps might be in the journal. Is there any output when you run coredumpctl list?

Also, if gdb is attached to a process, running generate-core-file does create a core dump, i.e. process 316 in this case:

zcobol@toto:~$ file core.316
core.316: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from '-bash', real uid: 1002, effective uid: 1002, real gid: 1002, effective gid: 1002, execfn: '/bin/bash', platform: 'x86_64'

The kernel.core_pattern was not modified. This is the default:

zcobol@toto:~$ sysctl kernel.core_pattern
kernel.core_pattern = |/usr/lib/systemd/systemd-coredump %P %u %g %s %t 9223372036854775808 %h

zcobol avatar Sep 06 '24 02:09 zcobol

Hi there,

systemd-coredump is not installed. When I try to run 'coredumpctl' I get the message:

Command 'coredumpctl' not found, but can be installed with:

sudo apt install systemd-coredump

paul-haskell avatar Sep 06 '24 04:09 paul-haskell

When I run "sysctl -a | grep core_pattern" on my WSL instance, I get: /mnt/wslg/dumps/core.%e

/mnt/wslg/dumps is empty, even after I run my core-making program. The directory's file permissions are drwxrwxrwx

paul-haskell avatar Sep 08 '24 20:09 paul-haskell

In wsl-2.3.17 the value of core_pattern is different:

elsaco@eleven:~/test$ sysctl kernel.core_pattern
kernel.core_pattern = |/wsl-capture-crash %t %E %p %s

Using strings on /init it shows:

elsaco@eleven:~/test$ strings /init | grep crash
<3>WSL (%d) ERROR: %s:%u: Received error while trying to capture crash dump: %u
<6>WSL (%d): Capturing crash for pid: %s, executable: %s, signal: %s, port: %u
<3>WSL (%d) ERROR: %s:%u: Error while trying read crash dump from stdin, %u
/wsl-capture-crash
wsl-capture-crash
|/wsl-capture-crash %t %E %p %s
crash-dump

so it looks hardcoded into the WSL's own init

Using a simple divide-by-zero test it does crash dumps:

elsaco@eleven:~/test$ ./zero
Floating point exception (core dumped)

and the trace shows in the dmesg output:

[20108.624055] traps: zero[12868] trap divide error ip:563cda3a4184 sp:7ffe113bb0a0 error:0 in zero[563cda3a4000+1000]
[20108.624065] potentially unexpected fatal signal 8.
[20108.624066] CPU: 0 PID: 12868 Comm: zero Not tainted 5.15.153.1-microsoft-standard-WSL2 #1
[20108.624068] RIP: 0033:0x563cda3a4184
[20108.624071] Code: 00 75 07 b8 ff ff ff ff eb 07 8b 45 fc 99 f7 7d f8 5d c3 f3 0f 1e fa 55 48 89 e5 48 83 ec 10 b8 0a 00 00 00 b9 00 00 00 00 99 <f7> f9 89 45 f4 b8 00 00 00 00 b9 00 00 00 00 99 f7 f9 89 45 f8 8b
[20108.624072] RSP: 002b:00007ffe113bb0a0 EFLAGS: 00010206
[20108.624073] RAX: 000000000000000a RBX: 00007ffe113bb1d8 RCX: 0000000000000000
[20108.624074] RDX: 0000000000000000 RSI: 00007ffe113bb1d8 RDI: 0000000000000001
[20108.624075] RBP: 00007ffe113bb0b0 R08: 0000000000000000 R09: 00007fa6ffd20380
[20108.624076] R10: 00007ffe113badd0 R11: 0000000000000203 R12: 0000000000000001
[20108.624076] R13: 0000000000000000 R14: 0000563cda3a6dc0 R15: 00007fa6ffd53000
[20108.624077] FS:  00007fa6ffafe740 GS:  0000000000000000
[20108.624514] WSL (12869): Capturing crash for pid: 10759, executable: !home!elsaco!test!zero
[20108.624516] , signal: 8, port: 50005

and journalctl

Sep 08 21:00:46 eleven kernel: traps: zero[12868] trap divide error ip:563cda3a4184 sp:7ffe113bb0a0 error:0 in zero[563>
Sep 08 21:00:46 eleven kernel: potentially unexpected fatal signal 8.
Sep 08 21:00:46 eleven kernel: CPU: 0 PID: 12868 Comm: zero Not tainted 5.15.153.1-microsoft-standard-WSL2 #1
Sep 08 21:00:46 eleven kernel: RIP: 0033:0x563cda3a4184
Sep 08 21:00:46 eleven kernel: Code: 00 75 07 b8 ff ff ff ff eb 07 8b 45 fc 99 f7 7d f8 5d c3 f3 0f 1e fa 55 48 89 e5 4>
Sep 08 21:00:46 eleven kernel: RSP: 002b:00007ffe113bb0a0 EFLAGS: 00010206
Sep 08 21:00:46 eleven kernel: RAX: 000000000000000a RBX: 00007ffe113bb1d8 RCX: 0000000000000000
Sep 08 21:00:46 eleven kernel: RDX: 0000000000000000 RSI: 00007ffe113bb1d8 RDI: 0000000000000001
Sep 08 21:00:46 eleven kernel: RBP: 00007ffe113bb0b0 R08: 0000000000000000 R09: 00007fa6ffd20380
Sep 08 21:00:46 eleven kernel: R10: 00007ffe113badd0 R11: 0000000000000203 R12: 0000000000000001
Sep 08 21:00:46 eleven kernel: R13: 0000000000000000 R14: 0000563cda3a6dc0 R15: 00007fa6ffd53000
Sep 08 21:00:46 eleven kernel: FS:  00007fa6ffafe740 GS:  0000000000000000
Sep 08 21:00:46 eleven unknown: WSL (12869): Capturing crash for pid: 10759, executable: !home!elsaco!test!zero
Sep 08 21:00:46 eleven unknown: , signal: 8, port: 50005

However, I can't figure out this entry: WSL: Capturing crash for pid:. Where does wsl-capture-crash stores the actual core file!?

elsaco avatar Sep 09 '24 04:09 elsaco

In wsl-2.3.17 core dumps are stored in \AppData\Local\Temp\wsl-crashes folder under your Windows home directory. You'll notice this kind of entries when running dmesg:

WSL (573): Capturing crash for pid: 366, executable: !home!zcobol!test!zero, signal:8, port: 50005

WSL is capturing the crash and dumps in the wsl-crashes folder.

Sample file:

PS C:\Users\valli>\AppData\Local\Temp\wsl-crashes\wsl-crash-1726372480-366-_home_zcobol_test_zero-8.dmp

Run sysctl kernel.core_pattern and if you didn't mess with the settings it should be like:

zcobol@texas:~$ sysctl kernel.core_pattern
kernel.core_pattern = |/wsl-capture-crash %t %E %p %s

Using systemd-coredump didn't work because it would kill init:

systemd-coredump[544]: Due to PID 1 having crashed coredump collection will now be turned off

zcobol avatar Sep 15 '24 04:09 zcobol

I checked my system: I do not have a ***\AppData\Local\Temp\wsl-crashes directory. (I do have ***\AppData\Local\Temp) My dmesg output does not show any "Capturing crash" messages. My "sysctl kernel.core_pattern" shows "/mnt/wslg/dumps/core.%e". And I do not have any files in /mnt/wslg/dumps, though I do have that directory.

paul-haskell avatar Sep 16 '24 05:09 paul-haskell

What @zcobol and and @elsaco said is right. We indeed added logic to capture coredumps in 2.3.17. The default path is %tmp%\wsl-crashes.

You can override the crash dump folder via:

[wsl2]
crashDumpFolder=C:\\path\\to\\folder

And you can completely disable this behavior via:

[wsl2]
maxCrashDumpCount=1

This will completely prevent WSL from touching core_pattern, which should allow to set your own custom path.

Let me know if this helps collecting coredumps for you !

OneBlue avatar Sep 16 '24 19:09 OneBlue

@paul-haskell: You most likely have an older build installed. Try running: wsl --update --pre-release to get the latest.

OneBlue avatar Sep 17 '24 01:09 OneBlue

@OneBlue, thanks for your message -- I am a lot closer after upgrading to WSL 2.3.17.

First, I ran with the default kernel.core_pattern of "|/wsl-capture-crash %t %E %p %s". When I ran my program that calls abort(), I did not have a .../AppData/Local/Temp/wsl-crashes directory created.

Next, I tried:

  1. ulimit -c unlimited
  2. sudo sysctl kernel.core_pattern=/mnt/c/Users/phaskell/AppData/Temp/core.%e
  3. (ran my program that calls abort() ) and I got a corefile! But it was empty i.e. 0 bytes. Same result when I repeated the tests.

Any ideas why my corefiles are empty?

paul-haskell avatar Sep 17 '24 06:09 paul-haskell

@paul-haskell: Can you collect /logs of this happening (for both scenarios) ?

OneBlue avatar Sep 20 '24 19:09 OneBlue

Here are the requested log for the second scenario i.e. set kernel.core_pattern=core.%e . Thanks for looking. (I will upload the other logs shortly.) WslLogs-2024-09-20_14-33-52.zip

paul-haskell avatar Sep 20 '24 21:09 paul-haskell

Diagnostic information
Detected appx version: 2.3.17.0

github-actions[bot] avatar Sep 20 '24 21:09 github-actions[bot]

Here are the logs for the first scenario (kernel.core_pattern=|/wsl-capture-crash %t %E %p %s ) WslLogs-2024-09-20_14-40-21.zip

paul-haskell avatar Sep 20 '24 21:09 paul-haskell

Diagnostic information
Detected appx version: 2.3.17.0

github-actions[bot] avatar Sep 20 '24 21:09 github-actions[bot]

Thank you @paul-haskell. Looking at the logs, I see that a crash dump is generated:

Microsoft.Windows.Lxss.Manager	LinuxCrash	09-20-2024 14:40:57.101	"	"	"FullPath: 	C:\Users\phaskell\AppData\Local\temp\wsl-crashes\wsl-crash-1726868457-485-_mnt_c_phaskell_CS221_Private_ClassDays_Day17_makeCore-6.dmp
Pid: 	485
Signal: 	6
process: 	!mnt!c!phaskell!CS221!Private!ClassDays!Day17!makeCore
wslVersion: 	2.3.17.0"				4996	14140	5		00000000-0000-0000-0000-000000000000		

Can you check the contents of C:\Users\phaskell\AppData\Local\temp\wsl-crashes\?

OneBlue avatar Sep 20 '24 22:09 OneBlue

I do see a core in .../Local/Temp/wsl-crashes and it is nonempty. So "case 1" works! Thank you. Any idea why "case 2" i.e. overridden kernel.core_pattern only creates empty corefiles? (The reason I care is because I am teaching a class on system programming, and I want to make it easy for students on Windows and Mac platforms to be able to debug with corefiles. If I can get the corefiles in the current directory via some configuration script, it will make the students' lives easy.)

paul-haskell avatar Sep 20 '24 23:09 paul-haskell

@paul-haskell: Does disabling systemd and restarting the distro help with case 2?

OneBlue avatar Sep 20 '24 23:09 OneBlue

I did a quick check, and I have 159 services managed by systemd. systemd manages all the startup services with Ubuntu, right? Can I really stop all of them?

(I tried stopping apport.service and setting kernel.core_pattern=core.%e but I still get empty corefiles.)

paul-haskell avatar Sep 20 '24 23:09 paul-haskell

Can you by setting

[boot]
systemd=false

in /etc/wsl.conf

OneBlue avatar Sep 20 '24 23:09 OneBlue

Ok, I did that test: In /etc/wsl.conf I set systemd=false, and I restarted my Ubuntu.

The system boots really quickly now. Unfortunately my corefiles are still empty. I'll attach another log to the case.

paul-haskell avatar Sep 20 '24 23:09 paul-haskell

WslLogs-2024-09-20_16-56-24.zip

Here are the logs with systemd=false in wsl.conf and with kernel.core_pattern=core.%e (and with empty corefiles)

paul-haskell avatar Sep 20 '24 23:09 paul-haskell

Diagnostic information
Detected appx version: 2.3.17.0

github-actions[bot] avatar Sep 21 '24 00:09 github-actions[bot]

Thank you @paul-haskell. I wonder if this could be namespace issue. What if you try to write the dumps to /mnt/wsl instead ?

OneBlue avatar Sep 23 '24 16:09 OneBlue

Interesting!

This time I did:

  • sudo sysctl kernel.core_pattern=/mnt/wsl/core.%e
  • ran my 'makeCore' program

I did get a nonempty corefile in /mnt/wsl called "core.makeCore".

Do you understand what the "namespace issue" is, and if so, could you explain it to me?

Thanks very much

Paul

paul-haskell avatar Sep 24 '24 05:09 paul-haskell

Ok that's good to know. The short version is that WSL creates different mount namespaces for distributions (and even within distributions in some usecases). /mnt/wsl is a mountpoint that all namespaces share, so it's often the best way to figure out if these mount namespaces get in the way of something else.

It's a bit unclear to me why you see dumps with 0 bytes though. You might need to experiment with core_pattern to get to the bottom of that, but in the meantime using /mnt/wsl should give you a way to get the dump at least

OneBlue avatar Sep 24 '24 16:09 OneBlue

True enough—I can get cores in a decent enough place. Thanks for all your help!

I agree it’s mysterious why I can get corefiles created but empty, with my original core_pattern.

Regards

Paul

paul-haskell avatar Sep 26 '24 06:09 paul-haskell