rmlint icon indicating copy to clipboard operation
rmlint copied to clipboard

rmlint hanging after saying "double free or corruption (out)"

Open nealmcb opened this issue 3 years ago • 2 comments

I'm running rmlint on a large collection of python virtualenvs (with lots of duplicate directories of python packages), and ran into a "double free or corruption (out)" followed by a hang. So it looks like some sort of malloc issue, since I notice this similar report: C++ pointer "error: double free or corruption (out)" - Stack Overflow

This is the second time I've run rmlint on this directory and it worked the first time, so perhaps the use of --xattrs is a factor, perhaps in combination with -D.

$ rmlint -D -x --xattr --followlinks --no-hardlinked -c sh:link -g $HOME/Envs/
                                Traversing (702019 usable files / 0 + 0 ignored files / folders)
                                Traversing (727081 usable files / 0 + 0 ignored files / folders)
                              Traversing (727351 usable files / 0 + 0 ignored files / folders)
                                    Preprocessing (reduces files to 497781 / found 0 other lint)
                                 Preprocessing (reduces files to 285412 / found 7017 other lint)
            Matching (148095 dupes of 43091 originals; 17.03 MB to scan in 9613 files, ETA: 30s)
            Matching (148117 dupes of 43106 originals; 12.73 MB to scan in 9548 files, ETA: 21s)
                    Matching (153791 dupes of 44300 originals; 0 B to scan in 0 files, ETA:  2s)

                                                     Merging files into directories (stand by...)

==> In total 727351 files, whereof 153791 are duplicates in 44300 groups.
==> This equals 3.63 GB of duplicates which could be removed.
==> 7017 other suspicious item(s) found, which may vary in size.
==> Scanning took in total  3m 35.328s.

Wrote a json file to: /srv/bs/neal/rmlint/rmlint.json
Wrote a sh file to: /srv/bs/neal/rmlint/rmlint.sh
double free or corruption (out)

After that output it is hanging over 10 minutes of inactivity so far. No cpu activity that I notice now. C-c doesn't kill it, but I can ^Z it and kill %1.

I've run it a few ties with valgrind as described at https://rmlint.readthedocs.io/en/latest/rmlint.1.html#bugs but it hasn't recurred.

I have seen it recur though, running it with gdb. The end of the stdout is a bunch of ls and rm commands, then:

Thread 1 "rmlint" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#0  0x00007ffff64cbfb7 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff64cd921 in __GI_abort () at abort.c:79
#2  0x00007ffff6516967 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff6643b0d "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3  0x00007ffff651d9da in malloc_printerr (str=str@entry=0x7ffff6645838 "double free or corruption (!prev)") at malloc.c:5342
#4  0x00007ffff6524f7c in _int_free (have_lock=0, p=0x7fffe17e4830, av=0x7fffec000020) at malloc.c:4311
#5  0x00007ffff6524f7c in __GI___libc_free (mem=0x7fffe17e4840) at malloc.c:3134
#6  0x0000555555582221 in  ()
#7  0x0000555555564d00 in  ()
#8  0x000055555556b26a in  ()
#9  0x000055555555cb84 in  ()
#10 0x00007ffff64aebf7 in __libc_start_main (main=
    0x55555555ca40, argc=10, argv=0x7fffffffcb78, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffcb68) at ../csu/libc-start.c:310
#11 0x000055555555d14a in  ()
(gdb) 

with this at the end of the stderr:

WARNING: Unexpected return code 3 from rm_util_link_type()
WARNING: Unexpected return code 3 from rm_util_link_type()
double free or corruption (!prev)
51      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

Note that I didn't recompile with GDB=1

$ rmlint --version version 2.10.1 compiled: Dec 6 2020 at [15:03:34] "Ludicrous Lemur" (rev 2a4443d1) compiled with: +mounts +nonstripped +fiemap +sha512 +bigfiles +intl +replay +xattr +btrfs-support

nealmcb avatar Dec 08 '20 04:12 nealmcb

Hey @nealmcb,

thanks for the report. That's interesting indeed, but to find the actual corruption I would need a better stack trace. Please re-compile rmlint like this and try to reproduce the issue:

$ scons DEBUG=1 VERBOSE=0 GDB=1

This is the second time I've run rmlint on this directory and it worked the first time, so perhaps the use of --xattrs is a factor, perhaps in combination with -D.

This could be a reason, since the memory management with -D is indeed a bit... confused.

I've run it a few ties with valgrind as described at https://rmlint.readthedocs.io/en/latest/rmlint.1.html#bugs but it hasn't recurred.

Sounds like timing-related then. Valgrind makes things a lot slower...

sahib avatar Dec 18 '20 11:12 sahib

This should be closed in favor of #562.

cebtenzzre avatar Feb 20 '22 23:02 cebtenzzre