MaterialFiles icon indicating copy to clipboard operation
MaterialFiles copied to clipboard

Extraction of tar.gz file takes too long

Open shenghuang147 opened this issue 1 year ago • 4 comments

Device Model: XiaoMi 10 Ultra Android Version: 10 QKQ1.200419.002 MIUI Version: xiaomi.eu 12.0.10 MaterialFiles Version: 1.7.2 (37) Source: F-Droid

I compressed all font files from the Fonts file of Windows 11 21H2 into Fonts.tar.gz , which contains a total of 336 font files

I found that MaterialFiles takes 5 minutes to fully extract this file, while using tar -zxf Fonts.tar.gz in termux only takes 3.3 seconds

.../Fonts $ time tar -zxf ../Download/Fonts.tar.gz

real    0m3.300s
user    0m2.785s
sys     0m0.482s

start: 00:55:40

图片

end: 01:00:37

图片

shenghuang147 avatar May 25 '24 16:05 shenghuang147

How large is the tar.gz file itself? My suspicion is that the decompression of every single file requires reading the archive from the beginning again. Not sure if this can be optimized easily.

zhanghai avatar May 25 '24 21:05 zhanghai

How large is the tar.gz file itself? My suspicion is that the decompression of every single file requires reading the archive from the beginning again. Not sure if this can be optimized easily.

only 191MB

shenghuang147 avatar May 26 '24 02:05 shenghuang147

Then it may just be that reason, decompressing roughly 300 * 100 MB = 30 GB of data

zhanghai avatar May 26 '24 02:05 zhanghai

Yeah, have this issue as well, depends on library implementation. maybe the extract all specifically should be optimised.

Feuerswut avatar Jun 23 '24 22:06 Feuerswut

My suspicion is that the decompression of every single file requires reading the archive from the beginning again

this is not how it's supposed to work and tar on the cli doesn't work like that either, a .tar.gz is archived and then compressed (not the other way around), so it needs to be decompressed only once at the beginning giving you a .tar (temporarily) and then the files should get extracted from the tar, after all files are extracted the temporary tar is deleted, so there should be no decompressing of all the individual files, only tge archive itself

masterflitzer avatar Nov 10 '24 12:11 masterflitzer

this is not how it's supposed to work and tar on the cli doesn't work like that either, a .tar.gz is archived and then compressed (not the other way around), so it needs to be decompressed only once at the beginning giving you a .tar (temporarily) and then the files should get extracted from the tar, after all files are extracted the temporary tar is deleted, so there should be no decompressing of all the individual files, only tge archive itself

That doesn't really matter. As long as the archive format doesn't support random access (TAR doesn't have an index), if you try to read one file independent of the other files you'll always have to start from the beginning. More complex logic could be implemented so that reading of different files from the same TAR archive can be coordinated in some way so that only one pass is needed, but I don't have the bandwidth for that right now.

zhanghai avatar Nov 10 '24 21:11 zhanghai

but this is about extract the whole archive, not individual files, so the equivalent of tar -axvf archive.tar

masterflitzer avatar Nov 11 '24 02:11 masterflitzer

Like I said earlier, extracting all files in an archive isn't different from extracting individual files as a file operation right now.

zhanghai avatar Nov 11 '24 16:11 zhanghai

yes i know, all i'm saying is that the differentiation between all and individual filea needs to be implemented and then the issue is fixed

masterflitzer avatar Nov 11 '24 19:11 masterflitzer