aiida-core Add `get_total_size_on_disk` method to `RemoteData`

I added this while I was working on writing the code that warns the user about the possible large file size to be retrieved in PR #6578, but I think it warrants a separate PR.

The simplest way I found to obtain the whole file size of the directory associated with a RemoteData object was to use listdir_withattributes recursively. If somebody is aware of a better alternative, please feel free to comment.

Will add some tests shortly.

Oct 15 '24 16:10 GeigerJ2

Codecov Report

Attention: Patch coverage is 82.60870% with 16 lines in your changes missing coverage. Please review.

Project coverage is 77.94%. Comparing base (c532b34) to head (9c3d2ba). Report is 39 commits behind head on main.

Files with missing lines	Patch %	Lines
src/aiida/orm/nodes/data/remote/base.py	88.24%	8 Missing :warning:
src/aiida/cmdline/commands/cmd_data/cmd_remote.py	50.00%	7 Missing :warning:
src/aiida/common/utils.py	90.00%	1 Missing :warning:

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6584      +/-   ##
==========================================
+ Coverage   77.92%   77.94%   +0.02%     
==========================================
  Files         563      563              
  Lines       41671    41761      +90     
==========================================
+ Hits        32467    32545      +78     
- Misses       9204     9216      +12

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Oct 15 '24 16:10 codecov[bot]

This is maybe machine-dependent, but rather than going via our API (that is more robust, but definitely going to be slower, I think) have a first "fast" option just running du -s and parsing the output (but careful about units! E.g. it uses "blocks", on some systems it's 512 or 2048bytes!! And if it fails, fall back to your solution?

Oct 17 '24 12:10 giovannipizzi

Note to self to run du via exec_command_wait method from transport.

Oct 17 '24 12:10 GeigerJ2

Thanks for the review, @agoscinski! I'm currently still working on this, will ping you once it's again ready for review.

I think the output of du --apparent-size is the same as with lstat, so you do not need to use lstat

The reason I'm providing lstat as a fallback option is if du is not available (e.g., MacOS, as you mentioned, didn't know that ^^), or if exec_command_wait isn't available, which will be the case for FirecREST.

Dec 10 '24 15:12 GeigerJ2

OK, this should be ready for a final review, @agoscinski and @khsrali. Also pinging, @mikibonacci, if you want to provide some feedback on the CLI/API for use in AiiDAlab?

Dec 11 '24 13:12 GeigerJ2

Thanks again for the review, @khsrali, I implemented your proposed changes.

Dec 12 '24 11:12 GeigerJ2

Thanks again for the review, @khsrali. I wrote down my reasoning for point 1 in my response to your comment in the code, and implemented point 2. Once CI passes here (hopefully), I'll squash-merge.

Dec 19 '24 08:12 GeigerJ2