rmlint
rmlint copied to clipboard
rmlint hanging after saying "double free or corruption (out)"
I'm running rmlint on a large collection of python virtualenvs (with lots of duplicate directories of python packages), and ran into a "double free or corruption (out)" followed by a hang. So it looks like some sort of malloc issue, since I notice this similar report: C++ pointer "error: double free or corruption (out)" - Stack Overflow
This is the second time I've run rmlint on this directory and it worked the first time, so perhaps the use of --xattrs
is a factor, perhaps in combination with -D
.
$ rmlint -D -x --xattr --followlinks --no-hardlinked -c sh:link -g $HOME/Envs/
Traversing (702019 usable files / 0 + 0 ignored files / folders)
Traversing (727081 usable files / 0 + 0 ignored files / folders)
Traversing (727351 usable files / 0 + 0 ignored files / folders)
Preprocessing (reduces files to 497781 / found 0 other lint)
Preprocessing (reduces files to 285412 / found 7017 other lint)
Matching (148095 dupes of 43091 originals; 17.03 MB to scan in 9613 files, ETA: 30s)
Matching (148117 dupes of 43106 originals; 12.73 MB to scan in 9548 files, ETA: 21s)
Matching (153791 dupes of 44300 originals; 0 B to scan in 0 files, ETA: 2s)
Merging files into directories (stand by...)
==> In total 727351 files, whereof 153791 are duplicates in 44300 groups.
==> This equals 3.63 GB of duplicates which could be removed.
==> 7017 other suspicious item(s) found, which may vary in size.
==> Scanning took in total 3m 35.328s.
Wrote a json file to: /srv/bs/neal/rmlint/rmlint.json
Wrote a sh file to: /srv/bs/neal/rmlint/rmlint.sh
double free or corruption (out)
After that output it is hanging over 10 minutes of inactivity so far. No cpu activity that I notice now. C-c doesn't kill it, but I can ^Z it and kill %1.
I've run it a few ties with valgrind as described at https://rmlint.readthedocs.io/en/latest/rmlint.1.html#bugs but it hasn't recurred.
I have seen it recur though, running it with gdb. The end of the stdout is a bunch of ls and rm commands, then:
Thread 1 "rmlint" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#0 0x00007ffff64cbfb7 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007ffff64cd921 in __GI_abort () at abort.c:79
#2 0x00007ffff6516967 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff6643b0d "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3 0x00007ffff651d9da in malloc_printerr (str=str@entry=0x7ffff6645838 "double free or corruption (!prev)") at malloc.c:5342
#4 0x00007ffff6524f7c in _int_free (have_lock=0, p=0x7fffe17e4830, av=0x7fffec000020) at malloc.c:4311
#5 0x00007ffff6524f7c in __GI___libc_free (mem=0x7fffe17e4840) at malloc.c:3134
#6 0x0000555555582221 in ()
#7 0x0000555555564d00 in ()
#8 0x000055555556b26a in ()
#9 0x000055555555cb84 in ()
#10 0x00007ffff64aebf7 in __libc_start_main (main=
0x55555555ca40, argc=10, argv=0x7fffffffcb78, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffcb68) at ../csu/libc-start.c:310
#11 0x000055555555d14a in ()
(gdb)
with this at the end of the stderr:
WARNING: Unexpected return code 3 from rm_util_link_type()
WARNING: Unexpected return code 3 from rm_util_link_type()
double free or corruption (!prev)
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
Note that I didn't recompile with GDB=1
$ rmlint --version version 2.10.1 compiled: Dec 6 2020 at [15:03:34] "Ludicrous Lemur" (rev 2a4443d1) compiled with: +mounts +nonstripped +fiemap +sha512 +bigfiles +intl +replay +xattr +btrfs-support
Hey @nealmcb,
thanks for the report. That's interesting indeed, but to find the actual corruption I would need a better stack trace. Please re-compile rmlint
like this and try to reproduce the issue:
$ scons DEBUG=1 VERBOSE=0 GDB=1
This is the second time I've run rmlint on this directory and it worked the first time, so perhaps the use of --xattrs is a factor, perhaps in combination with -D.
This could be a reason, since the memory management with -D
is indeed a bit... confused.
I've run it a few ties with valgrind as described at https://rmlint.readthedocs.io/en/latest/rmlint.1.html#bugs but it hasn't recurred.
Sounds like timing-related then. Valgrind makes things a lot slower...
This should be closed in favor of #562.