duc icon indicating copy to clipboard operation
duc copied to clipboard

max-depth semantics seem incorrect?

Open stapelberg opened this issue 5 years ago • 7 comments

/tmp/nested/sim % ls -lR
.:
total 0
drwxr-xr-x 4 michael michael 80 2020-05-10 11:11 host1
drwxr-xr-x 3 michael michael 60 2020-05-10 11:11 host2

./host1:
total 0
drwxr-xr-x 2 michael michael 60 2020-05-10 11:18 2020-04-10
drwxr-xr-x 2 michael michael 60 2020-05-10 11:18 2020-05-10

./host1/2020-04-10:
total 1024
-rw-r--r-- 1 michael michael 1048576 2020-05-10 11:18 big

./host1/2020-05-10:
total 2048
-rw-r--r-- 1 michael michael 2097152 2020-05-10 11:18 bigger

./host2:
total 0
drwxr-xr-x 2 michael michael 60 2020-05-10 11:11 2019-01-01

./host2/2019-01-01:
total 0
-rw-r--r-- 1 michael michael 0 2020-05-10 11:11 hey

Using --max-depth 0 seems to be the same thing as not specifying a limit:

% duc index --database=/tmp/duc.db --max-depth 0 --progress $PWD 
[#-------] Indexed 3.0Mb in 3 files and 6 directories

% duc ls --database=/tmp/duc.db                                 
  3.0M host1                                          
     0 host2

% duc ls --database=/tmp/duc.db host1
  2.0M 2020-05-10
  1.0M 2020-04-10

% duc ls --database=/tmp/duc.db host1/2020-05-10
  2.0M bigger

Maybe the manpage could mention that?

Next up, --max-depth 1. This seems to result in an empty database, and makes duc crash when accessing a specific path:

% duc index --database=/tmp/duc.db --max-depth 1 --progress $PWD 
[#-------] Indexed 3.0Mb in 3 files and 6 directories

% duc ls --database=/tmp/duc.db                                 

% duc ls --database=/tmp/duc.db host1
Requested path not found
zsh: segmentation fault (core dumped)  duc ls --database=/tmp/duc.db host1

So let’s try --max-depth 2. This gives me what I intuitively think of as a depth of 1: I do see the host1 and host2 subdirectories of the indexed directory, but nothing underneath them:

% duc index --database=/tmp/duc.db --max-depth 2 --progress $PWD 
[#-------] Indexed 3.0Mb in 3 files and 6 directories

% duc ls --database=/tmp/duc.db                                 
  3.0M host1
     0 host2

% duc ls --database=/tmp/duc.db host1

So maybe we need --max-depth 3? This gives me host1, host2, their subdirectories (as desired!) but also files within those subdirectories (unexpected)?!

% duc index --database=/tmp/duc.db --max-depth 3 --progress $PWD 
[#-------] Indexed 3.0Mb in 3 files and 6 directories

% duc ls --database=/tmp/duc.db                                 
  3.0M host1
     0 host2

% duc ls --database=/tmp/duc.db host1                           
  2.0M 2020-05-10
  1.0M 2020-04-10

% duc ls --database=/tmp/duc.db host1/2020-05-10
  2.0M bigger

Can you explain how --max-depth is supposed to work in my scenario? I want the first level (host1 and host2) as well as the level underneath (dated subdirectories like 2020-05-10), but nothing more. It doesn’t seem to be possible to do this right now, as per the above examples?

stapelberg avatar May 10 '20 09:05 stapelberg

I can't explain how the --max-depth code is supposed to work haven't used it myself or played with it, but I will look into the seg fault and come up with a patch when I get a chance.

If Zevv doesn't beat me to it. :-)

l8gravely avatar May 14 '20 01:05 l8gravely

Quoting John (2020-05-14 03:35:26)

I can't explain how the --max-depth code is supposed to work haven't used it myself or played with it, but I will look into the seg fault and come up with a patch when I get a chance.

I'm afraid the semantics are indeed not well defined, and it's hard to make something up that makes sense. I believe the original intent was to fully traverse the file system, but only store the totals up to depth N. The problem is that the resulting output often does not make sense, because the end result does not add up to the total. We have to take a good look at this to see if and how this can be fixed I think.

zevv avatar May 14 '20 05:05 zevv

Hi Michael,

Could you give us some more details on how you built duc, I haven't been able to re-create your seg-fault, but I don't use zsh (which shouldn't be a problem...) and I wonder which backend you're using.

Can you share the output of:

cat /proc/cpuinfo
duc --version

CFLAGS="-g" ./configure --prefix=/var/tmp/duc

And share the output at the end of the process, before you then do:

make -j 4

and maybe also attach a tar or zip file of the test directories and files you setup? I haven't been able to make it crash or segfault when I compile on Debian Stretch running my own compiled kernel 5.0.21, so I'm looking for a reproducer.

Thanks, John

P.S. Of course once I write and send this email, it will fail. Right? :-)

l8gravely avatar May 15 '20 02:05 l8gravely

Sure. I was running into the segfault with 1.4.4 from the Arch Linux AUR: https://aur.archlinux.org/packages/duc/, using the leveldb backend:

% duc --version
duc version: 1.4.4
options: cairo x11 ui leveldb

Unfortunately, arch linux packages don’t include debug info by default, so loading the core dump into gdb isn’t very useful:

% gdb =duc /tmp/core.3086571  
Reading symbols from /usr/bin/duc...
(No debugging symbols found in /usr/bin/duc)
[New LWP 3086571]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Core was generated by `duc ls --database=/tmp/duc.db host1'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000562f9b68a669 in ?? ()
(gdb) bt
#0  0x0000562f9b68a669 in  ()
#1  0x0000562f9b694fa2 in  ()
#2  0x0000562f9b695634 in  ()
#3  0x0000562f9b69573f in  ()
#4  0x0000562f9b68924f in  ()
#5  0x00007fba6ba16023 in __libc_start_main () at /usr/lib/libc.so.6
#6  0x0000562f9b6893ee in  ()
(gdb) quit

Next, I tried building from git so that I get debug symbols. Unfortunately (?), I cannot reproduce the crash with the version built from git, so maybe it was fixed in the meantime?

% ~/src/duc/duc ls --database=/tmp/duc.db host1
The requested path 'host1' was not found in the database,
Please run 'duc info' for a list of available directories.

% ~/src/duc/duc --version                      
duc version: 1.4.4
options: cairo x11 ui leveldb

I then rebuilt the 1.4.4 tarball which the AUR package uses, and I can reproduce the crash with it:

% /tmp/rebuild/duc-1.4.4/duc --version   
duc version: 1.4.4
options: cairo x11 ui leveldb

% gdb /tmp/rebuild/duc-1.4.4/duc /tmp/core.3102527 
Reading symbols from /tmp/rebuild/duc-1.4.4/duc...
[New LWP 3102527]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Core was generated by `/tmp/rebuild/duc-1.4.4/duc ls --database=/tmp/duc.db host1'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000564dc6544f49 in duc_dir_read (dir=dir@entry=0x0, st=st@entry=DUC_SIZE_TYPE_ACTUAL, sort=sort@entry=DUC_SORT_SIZE) at src/libduc/dir.c:286
286		dir->duc->err = 0;
(gdb) bt
#0  0x0000564dc6544f49 in duc_dir_read (dir=dir@entry=0x0, st=st@entry=DUC_SIZE_TYPE_ACTUAL, sort=sort@entry=DUC_SORT_SIZE) at src/libduc/dir.c:286
#1  0x0000564dc654f36f in ls_one (dir=dir@entry=0x0, level=level@entry=0, parent_path_len=parent_path_len@entry=0) at src/duc/cmd-ls.c:87
#2  0x0000564dc654f9ac in do_one (duc=duc@entry=0x564dc7c0a010, path=0x7ffd8df65850 "host1") at src/duc/cmd-ls.c:229
#3  0x0000564dc654faaf in ls_main (duc=0x564dc7c0a010, argc=<optimized out>, argv=<optimized out>) at src/duc/cmd-ls.c:271
#4  0x0000564dc6543b9f in main (argc=<optimized out>, argv=<optimized out>) at src/duc/main.c:177
(gdb) bt full
#0  0x0000564dc6544f49 in duc_dir_read (dir=dir@entry=0x0, st=st@entry=DUC_SIZE_TYPE_ACTUAL, sort=sort@entry=DUC_SORT_SIZE) at src/libduc/dir.c:286
        fn_comp = <optimized out>
#1  0x0000564dc654f36f in ls_one (dir=dir@entry=0x0, level=level@entry=0, parent_path_len=parent_path_len@entry=0) at src/duc/cmd-ls.c:87
        max_size = 0
        max_name_len = 0
        max_size_len = 6
        st = DUC_SIZE_TYPE_ACTUAL
        sort = DUC_SORT_SIZE
        tree = 0x564dc655c3e0 <tree_utf8>
        e = <optimized out>
        count = <optimized out>
        n = <optimized out>
#2  0x0000564dc654f9ac in do_one (duc=duc@entry=0x564dc7c0a010, path=0x7ffd8df65850 "host1") at src/duc/cmd-ls.c:229
        dir = 0x0
#3  0x0000564dc654faaf in ls_main (duc=0x564dc7c0a010, argc=<optimized out>, argv=<optimized out>) at src/duc/cmd-ls.c:271
        r = 0
#4  0x0000564dc6543b9f in main (argc=<optimized out>, argv=<optimized out>) at src/duc/main.c:177
        r = <optimized out>
        duc = 0x564dc7c0a010
        cmd = 0x564dc655c140 <cmd_ls>
        ducrc = 0x564dc8512830
        home = 0x7ffd8df658a3 "/home/michael"
        log_level = <optimized out>
(gdb) 

I’m currently using Linux 5.5.8, but it seems unlikely that the kernel version would matter :)

stapelberg avatar May 15 '20 06:05 stapelberg

Hi Michael, Thanks for the great info! I'll start poking more at the duc_read_dir() function, but would it be possible to get a copy of your DB to look over as well? You can just send it to my directly if you like via email attachment: [email protected]

Can you give us more info on the version of leveldb you're using as well? I have 1.18-5 on my debian box. And would it be possible for you to try an older version, say 1.4.3 on your bad DB as well? That would be interesting to see if we can chase this down.

Looking at src/libduc/dir.c, not much has really changed there in quite a while.

Ideally, this is already fixed in master and we should really just spin a v1.4.5 or v.1.5 release.

John

l8gravely avatar May 15 '20 12:05 l8gravely

I used git bisect to track it down to commit 8afb1bac7b3cd1b509084cd5da2dbce31bc83fa6, which fixes the segfault.

stapelberg avatar May 15 '20 20:05 stapelberg

"Michael" == Michael Stapelberg [email protected] writes:

Michael> I used git bisect to track it down to commit 8afb1ba, which fixes the segfault.

Awesome! Thanks for confirming it. I was playing with this off and on today but couldn't get a segfault in my tests. Looks like we really need to make a new release to roll up various fixes and changes.

It's also good to know that some distros are using other DBs by default.

John

l8gravely avatar May 15 '20 20:05 l8gravely