fs_extra icon indicating copy to clipboard operation
fs_extra copied to clipboard

Unexpected behaviour of dir::get_size

Open gliderkite opened this issue 5 years ago • 5 comments

In my application, I need to get the size of a directory, and in order to avoid having to re-invent the wheel I was planning to use the dir::get_size method provided by this library (thanks for your work!).

Unfortunately though, the results I am getting differ from the ones expected (I am targeting a Linux machine). This is because the implementation does not return the size of a directory, but the sum of all the sizes of its files, without taking into account the size of the directory structure itself.

You can quickly test this behaviour on a Linux machine by running the mention method on a given directory, and then running for the same directory the command du --bytes -s {dir_path}.

If this is the expected behaviour, I would simply suggest to explicitly state it in the documentation in a clear way.

If this is not the expected behaviour, it can be easily fixed by slightly changing the implementation:

pub fn get_size<P>(path: P) -> Result<u64>
where
    P: AsRef<Path>,
{
    let mut result = 0;

    if path.as_ref().is_dir() {
        for entry in read_dir(&path)? {
            let _path = entry?.path();
            if _path.is_file() {
                result += _path.metadata()?.len();
            } else {
                result += get_size(_path)?;
            }
        }
        // add the directory data structure size
        result += path.as_ref().metadata()?.len();
    } else {
        result = path.as_ref().metadata()?.len();
    }
    Ok(result)
}

I understand this will also require some work to update the tests to reflect the new implementation. Please let me know if support is needed for this (can't promise when I'll be able to have a deeper look into this in case it's required).

gliderkite avatar Oct 04 '19 10:10 gliderkite

Sorry for the late response. I missed a notification about it... Answer your question: Yes, it's expected behavior and unfortunately, i don't remember why... I was very busy this year... But soon i wan to return to the project and rewrite it. I wanna change API and realize new ideas(including yours) :smiley:

webdesus avatar Nov 20 '19 13:11 webdesus

I had the same problem. If this is the expected behaviour please note this in the documentation because for most people this is probably unexpected.

Also it'd be nice to know that get_size actually returns the amount of bytes not just a "size", so that should probably be added to the documentation.

If you want to change the behaviour but don't have time for this, I could try to create a PR for this problem. The problem should be fixed with a single additional line of code. But as already said tests will probably break. Also tests won't be reproducible any more (at least if the test directory contains further directories) because the folder size might vary in different file systems.

AaronErhardt avatar Dec 02 '20 22:12 AaronErhardt

@AaronErhardt I would really appreciate your, if you send PR :blush: Or i can try on the weekend fix it by myself

webdesus avatar Dec 03 '20 06:12 webdesus

Here's my PR #34 :)

AaronErhardt avatar Dec 03 '20 15:12 AaronErhardt

Unfortunately the fix in #34 didn't quite fix this issue. Specifically this change: https://github.com/webdesus/fs_extra/commit/e66ebf2148442c1f86c5120bfb607846e34ea11a#diff-9587f356b571dc31449279b19bc236db49889affe5e903deacb5cb0e8d121896L757-R757

Whilst after that change dir::get_size now counts the sizes of subdirectories within the hierarchy, it still doesn't count the parent directory's size (the initial path passed to dir::get_size).

In addition, the change from is_file() to is_dir() meant that symlinks were now included in the total, but unfortunately not as the symlink sizes, but the sizes of their targets. This caused #59.

I have a fix locally for both the last part of this issue, and also #59 :-)

edmorley avatar Jul 01 '22 14:07 edmorley