MaterialFiles
MaterialFiles copied to clipboard
Extraction of tar.gz file takes too long
Device Model: XiaoMi 10 Ultra
Android Version: 10 QKQ1.200419.002
MIUI Version: xiaomi.eu 12.0.10
MaterialFiles Version: 1.7.2 (37)
Source: F-Droid
I compressed all font files from the Fonts file of Windows 11 21H2 into Fonts.tar.gz , which contains a total of 336 font files
I found that MaterialFiles takes 5 minutes to fully extract this file, while using tar -zxf Fonts.tar.gz in termux only takes 3.3 seconds
.../Fonts $ time tar -zxf ../Download/Fonts.tar.gz
real 0m3.300s
user 0m2.785s
sys 0m0.482s
start: 00:55:40
end: 01:00:37
How large is the tar.gz file itself? My suspicion is that the decompression of every single file requires reading the archive from the beginning again. Not sure if this can be optimized easily.
How large is the tar.gz file itself? My suspicion is that the decompression of every single file requires reading the archive from the beginning again. Not sure if this can be optimized easily.
only 191MB
Then it may just be that reason, decompressing roughly 300 * 100 MB = 30 GB of data
Yeah, have this issue as well, depends on library implementation. maybe the extract all specifically should be optimised.
My suspicion is that the decompression of every single file requires reading the archive from the beginning again
this is not how it's supposed to work and tar on the cli doesn't work like that either, a .tar.gz is archived and then compressed (not the other way around), so it needs to be decompressed only once at the beginning giving you a .tar (temporarily) and then the files should get extracted from the tar, after all files are extracted the temporary tar is deleted, so there should be no decompressing of all the individual files, only tge archive itself
this is not how it's supposed to work and tar on the cli doesn't work like that either, a
.tar.gzis archived and then compressed (not the other way around), so it needs to be decompressed only once at the beginning giving you a.tar(temporarily) and then the files should get extracted from the tar, after all files are extracted the temporary tar is deleted, so there should be no decompressing of all the individual files, only tge archive itself
That doesn't really matter. As long as the archive format doesn't support random access (TAR doesn't have an index), if you try to read one file independent of the other files you'll always have to start from the beginning. More complex logic could be implemented so that reading of different files from the same TAR archive can be coordinated in some way so that only one pass is needed, but I don't have the bandwidth for that right now.
but this is about extract the whole archive, not individual files, so the equivalent of tar -axvf archive.tar
Like I said earlier, extracting all files in an archive isn't different from extracting individual files as a file operation right now.
yes i know, all i'm saying is that the differentiation between all and individual filea needs to be implemented and then the issue is fixed