lf icon indicating copy to clipboard operation
lf copied to clipboard

random crash, unexpected signal during runtime execution

Open 66RING opened this issue 4 years ago • 6 comments

I don't know how to reproduce it. It seems crash randomly during runtime.

lf_crash

My environment:

  • OS: arch linux, 5.13.5
  • CPU: AMD Ryzen 5 3550H with Radeon Vega Mobile Gfx (8) @ 2.100GHz
  • GPU: AMD ATI 03:00.0 Picasso
  • window manager: bspwm

66RING avatar Jul 30 '21 00:07 66RING

@66RING Thanks for reporting. We should eliminate such crashes as much as possible. Unfortunately, there's nothing much we can do until we find the steps to reproduce. Feel free to add more info here if you come up with something. It's interesting that stacktrace in the screenshot does not seem to show anything in our code but only system calls. Also, can you confirm you're using the latest version?

gokcehan avatar Jul 31 '21 14:07 gokcehan

@gokcehan This is the full stacktrace I simply copy from stdout. I tried to find lf's log file, but /tmp/lf.log only has history info. lf.log

66RING avatar Aug 01 '21 01:08 66RING

@66RING Backtrace shows some functions for user lookup. These are used to determine home directories for config file locations etc. Do you have something custom in your setup? Log file should be in /tmp/lf.${USER}.${id}.log. Do you have an empty USER variable in the environment?

gokcehan avatar Aug 01 '21 12:08 gokcehan

@gokcehan I start lf in a wrapper script, which combined Uberzug preview and lfcd.

# lfcd.sh

__lf() {
  if [ -n "$DISPLAY" ]; then
    export FIFO_UEBERZUG="${TMPDIR:-/tmp}/lf-ueberzug-$$"

    cleanup() {
      exec 3>&-
      rm "$FIFO_UEBERZUG"
    }

    if [ -f "$FIFO_UEBERZUG" ]; then
      rm "$FIFO_UEBERZUG"
    fi
    mkfifo "$FIFO_UEBERZUG" 2> /dev/null
    ueberzug layer --silent <"$FIFO_UEBERZUG" &
    upid=$!
    exec 3>"$FIFO_UEBERZUG"
    trap cleanup EXIT

    if ! [ -d "$HOME/.cache/lf" ]; then
      mkdir -p "$HOME/.cache/lf"
    fi

    lf "$@" 3>&-
  else
    lf "$@"
  fi
}

lfcd () {
    tmp="$(mktemp)"
    export LF_BACK="$(pwd)"
    (__lf -last-dir-path="$tmp" "$@")
    if [ -f "$tmp" ]; then
        dir="$(cat "$tmp")"
        rm -f "$tmp"
        unset LF_BACK
        if [ -d "$dir" ]; then
            if [ "$dir" != "$(pwd)" ]; then
                cd "$dir"
            fi
        fi
    fi
}

Usage:

$ source ./lfcd.sh
$ lfcd

And my lfrc is:

set previewer /home/ring/.config/lf/scope.sh
set cleaner /home/ring/.config/lf/clear_img.sh
set shell bash
set shellopts '-eu'
set ifs "\n"
set scrolloff 10

scope.sh should be from ranger

I checked /tmp/lf.${USER}.${id}.log. It recorded command history. Maybe I can find the last command, next time the crash happend.

66RING avatar Aug 01 '21 14:08 66RING

It seems to happen at random

log file of recent crash:

$  cat lf.ring.5523.log
2021/08/07 10:45:51 hi!
2021/08/07 10:45:51 loading files: []
2021/08/07 10:45:51 reading file: /home/ring/.config/lf/lfrc
2021/08/07 10:45:51 loading files: []
2021/08/07 10:50:49 search: loca
2021/08/07 10:50:50 pushing keys 5k
$ cat lf.ring.14779.log
2021/08/03 10:42:06 hi!
2021/08/03 10:42:06 loading files: []
2021/08/03 10:42:06 reading file: /home/ring/.config/lf/lfrc
2021/08/03 10:42:06 loading files: []
2021/08/03 10:42:08 search: make
$  cat lf.ring.105270.log
2021/08/04 08:09:39 hi!
2021/08/04 08:09:39 loading files: []
2021/08/04 08:09:39 reading file: /home/ring/.config/lf/lfrc
2021/08/04 08:09:39 loading files: []
2021/08/04 08:09:51 pushing keys 5k

66RING avatar Aug 07 '21 03:08 66RING

@66RING From the crash log, it seems that the segfault is happening in getgrgid_r, which is a libc function. Does it crash consistently when you move to certain files? If you can reproduce it, could you generate a core dump with GOTRACEBACK=crash lf?

en0mem avatar May 11 '22 18:05 en0mem

Hi, any updates on this? I've been getting what looks like the same fatal error every other day for months now.

Previously I was using lf-git from the aur, but never updated it. I'm not sure what the version was but I installed it around June 2022. Everything was working fine. Early this year I switched to the main repo package (version 28 at the time) and the crashes started immediately.

@SeekingBlues

Does it crash consistently when you move to certain files?

No. For me, it happens at random only while moving around files / directories with hjkl. Never happens while doing any other operation like opening a file or cutting and pasting.

Initially I thought it had something to do with first time preview generation or first encounter of a file on screen because it seems more likely to crash when new files are present, but I'm really not sure.

If you can reproduce it, could you generate a core dump with GOTRACEBACK=crash lf?

If you think this would help, can you please explain in detail how I'm supposed to do it?

@gokcehan

log file

I don't have one in /tmp/.

Do you have an empty USER variable in the environment?

[jameson@arch ~]$ echo $USER 
jameson 

Any ideas for how I can test this issue to possibly narrow down the cause? I have no idea how to deduce the error message.

2084x avatar Jun 14 '23 01:06 2084x

Hi, @2084x

If you think this would help, can you please explain in detail how I'm supposed to do it?

Just GOTRACEBACK=crash lf, (SOME_ENV=var <cmd>), which should generate some core dump file.

But it doesn't happen anymore when I add this GOTRACEBACK=crash env

66RING avatar Jun 14 '23 02:06 66RING

@66RING Oh I see, it's an env variable. So I've just prefixed lf with GOTRACEBACK=crash in my wrapper script. Hopefully it's all good now...

Where will the file be generated if it crashes again?

2084x avatar Jun 14 '23 07:06 2084x

The backtrace itself should be printed on the console, and the core dump can be retrieved using coredumpctl (systemd specific).

en0mem avatar Jun 14 '23 07:06 en0mem

Using the latest git version, it hasn't happend for me in a while. Whereas previously (some months ago) I saw it on a daily basis.

jneidel avatar Jun 14 '23 16:06 jneidel

Thanks for the help guys! Since adding that variable I haven't had any issues.

2084x avatar Jun 21 '23 03:06 2084x

well looks like it's not gone completely. I've had a few crashes over the last month. here's the coredump info from the most recent crash:

[jameson@arch ~]$ coredumpctl info 168684 
           PID: 168684 (lf) 
           UID: 1000 (jameson) 
           GID: 1000 (jameson) 
        Signal: 6 (ABRT) 
     Timestamp: Fri 2023-07-21 19:08:13 AEST (1min 53s ago) 
  Command Line: lf 
    Executable: /usr/bin/lf 
 Control Group: /user.slice/user-1000.slice/session-1.scope 
          Unit: session-1.scope 
         Slice: user-1000.slice 
       Session: 1 
     Owner UID: 1000 (jameson) 
       Boot ID: 2515cafb98c441b398ee216a3f730e0d 
    Machine ID: 8e78a3564f072d699b94099062bad255 
      Hostname: arch 
       Storage: /var/lib/systemd/coredump/core.lf.1000.2515cafb98c441b398ee216a3f730e0d.168684.1689930493000000.zst (present) 
  Size on Disk: 1.0M 
       Message: Process 168684 (lf) of user 1000 dumped core. 
                Stack trace of thread 168709: 
                #0  0x000055b508e95aa1 n/a (lf + 0xc0aa1) 
                #1  0x000055b508e78eae n/a (lf + 0xa3eae) 
                #2  0x000055b508e774c7 n/a (lf + 0xa24c7) 
                #3  0x000055b508e95da9 n/a (lf + 0xc0da9) 
                #4  0x00007f3a97a65ab0 n/a (libc.so.6 + 0x39ab0) 
                #5  0x000055b508e95aa1 n/a (lf + 0xc0aa1) 
                #6  0x000055b508e78a38 n/a (lf + 0xa3a38) 
                #7  0x000055b508e61c91 n/a (lf + 0x8cc91) 
                #8  0x000055b508e61c0c n/a (lf + 0x8cc0c) 
                #9  0x000055b508e618bf n/a (lf + 0x8c8bf) 
                #10 0x000055b508e787a9 n/a (lf + 0xa37a9) 
                #11 0x00007f3a97a68cfd getenv (libc.so.6 + 0x3ccfd) 
                #12 0x00007f3a6c38ab36 _nss_systemd_getgrgid_r (libnss_systemd.so.2 + 0xeb36) 
                #13 0x00007f3a97afe0af getgrgid_r (libc.so.6 + 0xd20af) 
                #14 0x000055b508fe48ca _cgo_6f668e16310a_Cfunc_mygetgrgid_r (lf + 0x20f8ca) 
                #15 0x000055b508e93e01 n/a (lf + 0xbee01) 
                ELF object binary architecture: AMD x86-64 

here is the file at /var/lib/systemd/coredump/ https://files.catbox.moe/encomw.zst

and the file generated with coredumpctl -o lf.coredump dump /usr/bin/lf https://files.catbox.moe/9u5yir.coredump

is there any useful info here?

I started my lf wrapper from a keybind so I didn't catch the terminal output, but I assume it's the same as the log in my first comment.

2084x avatar Jul 21 '23 09:07 2084x

Seems like a race condition between getenv and setenv. In the preview thread, exportOpts is invoked, which calls setenv several times. In the main thread, getgrgid is invoked to get the group name information of a file, which calls into _nss_systemd_getgrgid_r, and then getenv. setenv invalidates the pointers used by getenv, causing the segfault.

en0mem avatar Jul 24 '23 21:07 en0mem

Seems like a race condition between getenv and setenv. In the preview thread, exportOpts is invoked, which calls setenv several times. In the main thread, getgrgid is invoked to get the group name information of a file, which calls into _nss_systemd_getgrgid_r, and then getenv. setenv invalidates the pointers used by getenv, causing the segfault.

Thanks for investigating this! Incidentally, we have been discussing about removing such exports in the preview thread in #1314 as well, see https://github.com/gokcehan/lf/issues/1314#issuecomment-1631141977 for more details.

joelim-work avatar Jul 26 '23 01:07 joelim-work

#1354 has now been merged, which removes the exportOpts call in the preview thread.

@66RING, @2084x You can try the latest version in the master branch to see if lf still crashes. If it doesn't then we can close this issue.

joelim-work avatar Jul 27 '23 00:07 joelim-work

Closing this issue for now since #1354 has been merged and there hasn't been any further crashes reported.

joelim-work avatar Aug 07 '23 08:08 joelim-work

@joelim-work apologies for the delayed reply. since the crashes happened at random I was waiting a couple of weeks to see if any would occur. I can now report that everything has been working perfectly, so thank you for the fix!

2084x avatar Aug 07 '23 09:08 2084x