SevenZipSharp icon indicating copy to clipboard operation
SevenZipSharp copied to clipboard

Work in progress: 1810x increase in extraction speed for solid archives. Fixes issue #21

Open fartwhif opened this issue 7 years ago • 4 comments

Fixes issue #21 brought a direct stream getter delegate out made the stream classes public changed the signature for the "extract all files with callback" function removed garbage collection bossiness

fartwhif avatar Sep 09 '18 14:09 fartwhif

It's very hacky. Need to re-hide the stream classes and revert the extract API so it has the previous signature and method. May need to infer the index of the stream 7z is asking for, for extraction of a single entry as opposed to everything - don't add an outer loop to do this, as doing so will reintroduce the problem.

fartwhif avatar Sep 09 '18 14:09 fartwhif

large solid archives are incredibly slow for both extracting the entire archive, and for extracting a single file towards the end of the data (a relatively high index). This PR provides a massively improved way to do so. It's still not as good as using 7z normally, but it's degrees of magnitude better than before. One thing to note is that the project lacks archive files to test with that are large (>100MB). Needless to say all the c# wrappers and pure .net libraries seem to suffer from this problem in different ways.

fartwhif avatar Sep 18 '18 01:09 fartwhif

So I'm still not happy about exposing so many internals to support this solution.

Given the fact that the documentation for ExtractFiles (with callback) states 7-Zip (and any other solid) archives are NOT supported. I'm more inclined to simply add something like

if (IsSolid)
{
    throw new NotSupportedException("Solid archives are not supported.");
}

Do you actually need the per-file-callback, or could you use the other extraction methods that don't have this slowdown?

squid-box avatar Sep 22 '18 18:09 squid-box

Yes, my solution essentially needs file-entry specific callback with both solid and non solid archives. If testing is to reveal jaw dropping problems like this then the testing scenarios need to include large solid archive containing many file-entries with the objective of extracting a file-entry near the end of it and another objective of extracting all file-entries from it. I may have more to add to this PR I'll have to check tonight. Inferring things through nested loops and wrapping things too much causes this. Take it or leave it, I'm just saying THANKS VERY MUCH for getting the ball rolling for me!

fartwhif avatar Nov 20 '18 14:11 fartwhif