MSYS2-packages icon indicating copy to clipboard operation
MSYS2-packages copied to clipboard

`checking available disk space` is slowest operation

Open goyalyashpal opened this issue 2 years ago • 23 comments

  • on doing pacman -Syu, the checking available disk space is almost always the slowest step
  • and i always wonder why? like what makes it take sooo much time? and if it can be improved
  • it's performing nothing like downloading, extracting, copying, searching etc...
:: Starting full system upgrade...
resolving dependencies...
looking for conflicting packages...

Packages (63) ...

Total Download Size:   138.77 MiB
Total Installed Size:  886.21 MiB
Net Upgrade Size:       -1.71 MiB

:: Proceed with installation? [Y/n]
:: Retrieving packages...
...
 Total (63/63) ...
(63/63) checking keys in keyring ...
(63/63) checking package integrity ...
(63/63) loading package files ...
(63/63) checking for file conflicts ...
(63/63) checking available disk space ...
:: Processing package changes...

goyalyashpal avatar Nov 15 '23 21:11 goyalyashpal

We have had these reports before, but the problem is that it is only slow for some users. Here it is instant for example.

We need to find a way to reproduce.

lazka avatar Nov 15 '23 21:11 lazka

We need to find a way to reproduce.

how can i help?

goyalyashpal avatar Nov 15 '23 21:11 goyalyashpal

oh, the msys on my system in in a custom location (/d/msys64). on a drive partitioned at 449 GB - 342 free on an HDD.

if i recall correct (might be wrong too), it was fast when it was in default location (/c/msys64) - a disk of capacity 118 GB on SSD.

goyalyashpal avatar Nov 15 '23 21:11 goyalyashpal

Note that the checking can be disabled im the pacman config. That's just a workaround of course.

lazka avatar Nov 15 '23 21:11 lazka

Years ago when I had checked it on very old HDD it was caused by Microsoft Defender. Even though I had added MSYS2 directory to the exceptions. Temporarily disabling the Defender made it much faster. I think Pacman might be doing something that is normal on Unix but slow when emulated by Cygwin on Windows/NTFS and also causes AV software to do a lot of unnecessary work.

mati865 avatar Nov 16 '23 20:11 mati865

I think Pacman might be doing something that is normal on Unix but slow when emulated by Cygwin on Windows/NTFS

yeah...

  • the du for calculating disk usage is also suppppperrrr slow
  • du -hd1 -t500MiB "$LOCALAPPDATA" took 20 minutes
  • whereas doublecmd does the equivalent in less than a minute (via "Show Occupied Space" S-M-Ret < Commands < Menu bar). see attached screenshot.
$ # diskusage -hd1 -t500MiB
$ du --human-readable --max-depth=1 --threshold=500MiB "$LOCALAPPDATA"
du: cannot read directory './ElevatedDiagnostics': Permission denied
3.1G    ./FXHOME
756M    ./GitHubDesktop
763M    ./JetBrains
du: cannot read directory './Microsoft/Windows/INetCache/Low/Content.IE5': Permission denied
2.1G    ./Microsoft
du: cannot read directory './Packages/B9ECED6F.ASUSBatteryHealthCharging_qmba6cd70vzyy/SystemAppData/Helium/Cache': Permission denied
du: cannot read directory './Packages/B9ECED6F.ASUSKeyboardHotkeys_qmba6cd70vzyy/SystemAppData/Helium/Cache': Permission denied
1.1G    ./Packages
740M    ./pip
637M    ./Programs
1.3G    ./Vivaldi
14G     .

Screenshot from doublecmd :

goyalyashpal avatar Nov 20 '23 15:11 goyalyashpal

self hiding as off topic: as this ain't about slowness of calculating disk size feel free to open this in a dedicated discussion if you are feeling generous & want to explain this 😋


whereas the doublecmd does this in less than a minute:

P.S. also note the differences in the output sizes,

$ du --help | grep -Eni "apparent|default"
8:      --apparent-size   print apparent sizes, rather than disk usage; although
9:                          the apparent size is usually smaller, it may be
15:  -b, --bytes           equivalent to '--apparent-size --block-size=1'
33:  -P, --no-dereference  don't follow any symbolic links (this is the default)
54:Otherwise, units default to 1024 bytes (or 512 if POSIXLY_CORRECT is set).

$ echo $DU_BLOCK_SIZE, $BLOCK_SIZE and $BLOCKSIZE, or $POSIXLY_CORRECT
, and , or

$ # -d0 eq. --max-depth=0 eq. --summary eq. -s
$ # -B1024 eq --block-size=1024 eq neutral/identity/default option
$ du "$LOCALAPPDATA/FXHOME" -d0 -B1024
3148496 FXHOME

$ du "$LOCALAPPDATA/FXHOME" -d0 --apparent -B1024
3148084 FXHOME

$ # Only this one matches with "Size" field in windows Properties dialog
$ du "$LOCALAPPDATA/FXHOME" -d0 -b
3223637515      FXHOME

$ qalc 3148496*1024, 3148084*1024
[3148496 * 1024, 3148084 * 1024] = [3224059904, 3223638016]

how come -b i.e. --apparent --block-size=1 (= 3223637515) is not equal to --apparent --block-size=1024 * 1024 (= 3148496 * 1024 = 3224059904) ??

Compare this with the values shown in windows Properties dialog for the directory:

Size:          3.00 GB (3,223,637,515 bytes)
Size on disk:  3.00 GB (3,223,990,272 bytes)

goyalyashpal avatar Nov 20 '23 16:11 goyalyashpal

$ # Only this one matches with "Size" field in windows Properties dialog
$ du "$LOCALAPPDATA/FXHOME" -d0 -b

i thought maybe this is fetching from windows, so, could it be fast? but no, i retried after closing all the shells - and it is still taking time.

takeaways:

  • closing the shell seems to clear the cache - so, it seems that original behaviour of the command is shown again. important as executing du on [edit: some list of subdirs of] same argument again in same session yields result instantly
  • the idea of "maybe it's fast" is wrong

i also took this opportunity to run time on it, i have omitted the output of du here, as it's exactly same as above with values in different units. and excuse [edit: and avoid] the conflicting options used together 😅 --human-readable -b [edit: have replaced -b with --apparent-size]

$ time du --human-readable --max-depth=1 --threshold=500MiB "$LOCALAPPDATA" --apparent-size
du: cannot read directory './ElevatedDiagnostics': Permission denied
3223637515      ...
  ...

real    19m12.818s
user    0m6.703s  
sys     1m7.219s  

goyalyashpal avatar Nov 20 '23 16:11 goyalyashpal

Some ideas:

  • use Strace or Procmon to see what / how many IO operations are happening
  • Du is not doing the same thing as Pacman, but there might be similarities
  • Pacman's algorithm doesn't seem to be doing anything egregious other that stat-ing every file that's going to be removed or replaced; if that's the issue, I'd assume installing new packages is going to be very fast, but upgrading and removing, especially with many files, can be very slow

elieux avatar Nov 20 '23 18:11 elieux

  • is there some way to forcefully make pacman to perform only this action of "checking available disk space" ?? as i think that coupling to upgradation will hugely hinder me in investigating this issue in context of pacman.

  • as for du - yeah sure, i will try to learn the aforementioned things (strace/procmon) and use them over du.

  • also, i noticed that \time outputs in this format (executed on a dummy command):

0.03user 0.14system 0:00.17elapsed 94%CPU (0avgtext+0avgdata 9920maxresident)k
0inputs+0outputs (2633major+0minor)pagefaults 0swaps

is what you are pointing towards? would this be enough?

goyalyashpal avatar Nov 20 '23 18:11 goyalyashpal

@goyalyashpal, here goes my results – one minute and a half:

$ time du --human-readable --max-depth=1 --threshold=500MiB "$LOCALAPPDATA" -b
1263105728      C:\Users\saukrs\AppData\Local/0install.net
3674538443      C:\Users\saukrs\AppData\Local/Autodesk
du: cannot read directory 'C:\Users\saukrs\AppData\Local/ElevatedDiagnostics': Permission denied
du: cannot read directory 'C:\Users\saukrs\AppData\Local/Google/Chrome/User Data/CertificateRevocation/8365': Permission denied
du: cannot read directory 'C:\Users\saukrs\AppData\Local/Google/Chrome/User Data/CertificateRevocation/8367': Permission denied
du: cannot read directory 'C:\Users\saukrs\AppData\Local/Google/Chrome/User Data/OptimizationHints/421': Permission denied
6783482254      C:\Users\saukrs\AppData\Local/Google
du: cannot read directory 'C:\Users\saukrs\AppData\Local/Microsoft/Windows/INetCache/Low/Content.IE5': Permission denied
2406862526      C:\Users\saukrs\AppData\Local/Microsoft
du: cannot read directory 'C:\Users\saukrs\AppData\Local/Programs/Opera/.opera/1DCCC0B63140': Permission denied
du: cannot read directory 'C:\Users\saukrs\AppData\Local/Temp/WinSAT': Permission denied
15325201831     C:\Users\saukrs\AppData\Local

real    1m34.078s
user    0m3.968s
sys     0m31.968s

Note that rerunning the command instantly didn't change timing:

$ time du --human-readable --max-depth=1 --threshold=500MiB "$LOCALAPPDATA" -b
  ...
du: cannot read directory 'C:\Users\saukrs\AppData\Local/Programs/Opera/.opera/1DCCC0B63140': Permission denied
du: cannot read directory 'C:\Users\saukrs\AppData\Local/Temp/WinSAT': Permission denied
15327079606     C:\Users\saukrs\AppData\Local

real    1m32.405s
user    0m4.640s
sys     0m37.343s

Maybe that's due to MP Realtime Protection being off here at the moment:

$ powershell '$prefs = Get-MpPreference; $prefs.DisableRealtimeMonitoring'
True

Maybe you should check that and disable it if it's enabled:

$ sudo powershell 'Set-MpPreference -DisableRealtimeMonitoring $true'

Note also, that run of an eleveated du is a bit shorter:

$ time sudo du --human-readable --max-depth=1 --threshold=500MiB "$LOCALAPPDATA" -b
1263105728      C:\Users\saukrs\AppData\Local/0install.net
3674538443      C:\Users\saukrs\AppData\Local/Autodesk
6785324008      C:\Users\saukrs\AppData\Local/Google
2406862776      C:\Users\saukrs\AppData\Local/Microsoft
15326500357     C:\Users\saukrs\AppData\Local

real    1m12.072s
user    0m0.015s
sys     0m0.015s

I used gsudo for that.

sskras avatar Nov 20 '23 19:11 sskras

@elieux commented 1 hour ago:

  • Du is not doing the same thing as Pacman, but there might be similarities
  • Pacman's algorithm doesn't seem to be doing anything egregious other that stat-ing every file that's going to be removed or replaced;

My impressions of MSYS2 is that stat-ing is quite slow. Maybe not as slow as @goyalyashpal is experiencing on his Windows box, but still annoying.

BTW, Midipix is like 4x faster:

midipix@DESKTOP-O7JE7JE ~
$ time du --human-readable --max-depth=1 --threshold=500MiB 'C:\Users\saukrs\AppData\Local' -b
1269876416      C:\Users\saukrs\AppData\Local/0install.net
3695960523      C:\Users\saukrs\AppData\Local/Autodesk
du: cannot access 'C:\Users\saukrs\AppData\Local/Google/Chrome/User Data/CrashpadMetrics.pma': No such file or directory
6803580813      C:\Users\saukrs\AppData\Local/Google
2426903812      C:\Users\saukrs\AppData\Local/Microsoft
15399724254     C:\Users\saukrs\AppData\Local

real    0m18.005s
user    0m0.000s
sys     0m0.000s

sskras avatar Nov 20 '23 19:11 sskras

PS. Pressing Alt-Enter for this directory in the Explorer exhibits calculation times around 26 seconds:

image

This timing is more similar to Midipix (which uses NTAPI to get the metadata) than to MSYS2 (or Cygwin for that matter).

sskras avatar Nov 20 '23 19:11 sskras

My impressions of MSYS2 is that stat-ing is quite slow. Maybe not as slow as @goyalyashpal is experiencing on his Windows box, but still annoying.

My recollections of stat sucking is that for any stat of foo it will also try to stat foo.exe and at least see if foo.lnk exists (for the symlinks-emulated-via-lnk-file feature). Perhaps whatever cache(s) don't cache an error (ENOENT) result, that would make it even worse...

jeremyd2019 avatar Nov 21 '23 04:11 jeremyd2019

I've posted very short tests results at https://github.com/msys2/msys2-pacman/issues/32#issuecomment-1973629270 TL;DR I recommend installing MSYS2 inside Dev Drives.

mati865 avatar Mar 01 '24 18:03 mati865

Getting the stat of files in windows is very slow when AV/Malware detectors are running. The solution I use when checking the stat of many files is to do a few of them in parallel. This becomes a lot faster and do not seem to have a negative impact on system performance. I am guessing that Windows is just waiting for something inside each stat request. This is not a problem that is unique to MSYS. Same when opening files for reading and or writing there is a significant delay when opening the file that I guess is related to the stat problem.

As my code for this is written in C# we need a C implementation that can be used in msys. A small library for bulk stating and bulk opening of files that use async io or threads for this.

I also found the following issue that seems to indicate that stating files actually opens them to get a file handle before getting information, but that there is a new API in Windows 11 that doesn't open the file and therefore is much faster. [(https://github.com/libuv/libuv/pull/4327)]

Perhaps using libuv is the solution if there is a mingw version where it is compiled to use the Windows API?

This however doesn't solve this issue for older versions of Windows but that may be something that we can live with. It also doesn't solve that opening a file is slow.

mateli avatar Apr 22 '25 09:04 mateli

Ideally, this would be integrated into Cygwin. Using Win32 API for that is really far from ideal because you'd have to do conversions between different path formats yourself.

mati865 avatar Apr 22 '25 17:04 mati865

Also, Cygwin's stat will actually read from files in order to determine the 'execute' bit (is it an MZ or #!) in the default case here of noacl

jeremyd2019 avatar Apr 22 '25 17:04 jeremyd2019

We have had these reports before, but the problem is that it is only slow for some users. Here it is instant for example.

We need to find a way to reproduce.

I'm on w11 pro, with nvme drives only c={120BG, 10BG free}, d={350GB, 66GB free}, msys2 is installed at D:\msys64. This thing takes minutes. Not 1-2, but 5-10 minutes.

Note that the checking can be disabled im the pacman config. That's just a workaround of course.

Please, @lazka how can I disable it??

pps83 avatar Apr 27 '25 17:04 pps83

done... (it was around 70% when I wrote previous message). So, to complete last 20-30% it took around 8 minutes. It was more than 20mins total for sure. While it was doing the freespace check, only MsMpEng.exe (msft antivirus) was taking CPU (one core only). I did not add msys64 dir to av exclude folders (perhaps adding it, would avoid the issue).

pps83 avatar Apr 27 '25 17:04 pps83

Please, how can I disable it??

Comment out CheckSpace option in /etc/pacman.conf

jeremyd2019 avatar Apr 27 '25 17:04 jeremyd2019

i am running linux nixos with same NTFS, and computing disk space (in normal usage) in both nemo as well doublecmd file explorer is equally (figurative) slow.

so i think it is some inherent problem with unixy things while dealing with NTFS.

$ nix run nixpkgs#inxi -- --filter --system --extra  # -zSx
System:
  Kernel: 6.12.34 arch: x86_64 bits: 64 compiler: gcc v: 14.3.0
  Desktop: Cinnamon v: 6.4.7 Distro: NixOS 25.11 (Xantusia)

goyalyashpal avatar Jul 02 '25 09:07 goyalyashpal

I'm experiencing this as well but FWIW after waiting ten minutes for this to finish, turning off Windows Defender real-time protection made the space check finish nearly instantly.

eiis1000 avatar Oct 02 '25 19:10 eiis1000