xz
xz copied to clipboard
Unzipping is too slow
When i tried to unzip big file (about 3 GiB size in xz and about 18 GiB unpacked) the process was too slow - only 3 GiB of 18 unpacked in about 40 min on my machine. The same file was unpacked for about 5 minutes using 7 zip tool
Thank you for reporting. This is expected and I have following language in README.md:
At this time the package cannot compete with the xz tool regarding compression speed and size.
I haven't found the time so far to work on code optimization. On the plus side there is a lot of potential on improving the situation. Unfortunately I cannot promise when I will work on it.
There is work ahead. I left the issue open.
I just ran into slow decompression and the (partial) solution is to wrap your reader in bufio.NewReader()
. It turns out this library uses ReadByte()
a great deal and on unbuffered input this is incredibly slow.
I say "partial" as unfortunately this fails on some inputs with
writeMatch: distance out of range
Very weird that it fails when buffered but works when unbuffered..
Yes, the library doesn't implement its own buffering and because it uses ReadByte it benefits from buffered readers. I should have documented it.
Rationale at the time has been that I wanted to use a buffered reader only if there is a need for it. For instance I didn't want to use a buffered reader for a bytes.Buffer.
A buffered reader shouldn't make a difference for the reading process. The gxz tool is using a buffered reader and I have run extensive tests for it.
Can you provide the file that you want to decompress?
Sure, I was decompressing the Zig tarballs from here.
Fixed!
I have now downloaded all 0.8.0 files and decompressed it with the gxz tool, which uses bufio.Reader and there were no problems to decompress all of them.
Please provide:
- name of the actual file generating issues
- version of the xz module
- the code you are using to decompress the file
- output of go.env
Oh you're asking for the failing one, sorry, that wasn't clear - I thought you were asking for one of the slow ones.
This is the one that fails. Interestingly it also fails with github.com/xi2/xz
Hi, this a deb file, which is an ar file. You must do the following:
$ ar xv bzip2_1.0.6-9.2_deb10u1_amd64.deb
x - debian-binary
x - control.tar.xz
x - data.tar.xz
The two xz files can easily be uncompressed and generate no issues for me. The debian-binary is a plain-text file. Infos about the deb format can be found by the manual page for deb.
I reran my test using @alecthomas suggestion. It is still slower than xi2/xz
, but it was a huge speedup:
package test
import (
"archive/tar"
"bufio"
"io"
"os"
"path"
"testing"
ulikunitz "github.com/ulikunitz/xz"
xi2 "github.com/xi2/xz"
)
const cargo = "cargo-1.54.0-x86_64-pc-windows-gnu.tar.xz"
func readFrom(r io.Reader) error {
tr := tar.NewReader(r)
for {
n, err := tr.Next()
if err == io.EOF {
break
} else if err != nil {
return err
} else if n.Typeflag != tar.TypeReg {
continue
}
os.MkdirAll(path.Dir(n.Name), os.ModeDir)
f, err := os.Create(n.Name)
if err != nil {
return err
}
defer f.Close()
f.ReadFrom(tr)
}
return nil
}
// 0.905s
func TestUlikunitz(t *testing.T) {
f, err := os.Open(cargo)
if err != nil {
t.Fatal(err)
}
defer f.Close()
r, err := ulikunitz.NewReader(bufio.NewReader(f))
if err != nil {
t.Fatal(err)
}
if err := readFrom(r); err != nil {
t.Fatal(err)
}
}
// 0.614s
func TestXi2(t *testing.T) {
f, err := os.Open(cargo)
if err != nil {
t.Fatal(err)
}
defer f.Close()
r, err := xi2.NewReader(f, 0)
if err != nil {
t.Fatal(err)
}
if err := readFrom(r); err != nil {
t.Fatal(err)
}
}
I used xz to unpack Python-3.11.4.xz. Using Python 3.10 it took 4sec; using Go it took 1m55sec. So I do think Go xz has a speed issue.
I just tried github.com/therootcompany/xz and it took 5sec.
I posted this two years ago but it got deleted. here is it again. should help with the speed:
package test
import (
"archive/tar"
"bufio"
"github.com/ulikunitz/xz"
"io"
"os"
"path"
"testing"
)
const cargo = "cargo-1.54.0-x86_64-pc-windows-gnu.tar.xz"
func readFrom(r io.Reader) error {
tr := tar.NewReader(r)
for {
n, err := tr.Next()
if err == io.EOF {
break
} else if err != nil {
return err
} else if n.Typeflag != tar.TypeReg {
continue
}
os.MkdirAll(path.Dir(n.Name), os.ModeDir)
f, err := os.Create(n.Name)
if err != nil {
return err
}
defer f.Close()
f.ReadFrom(tr)
}
return nil
}
func TestUlikunitz(t *testing.T) {
f, err := os.Open(cargo)
if err != nil {
t.Fatal(err)
}
defer f.Close()
r, err := xz.NewReader(bufio.NewReader(f))
if err != nil {
t.Fatal(err)
}
if err := readFrom(r); err != nil {
t.Fatal(err)
}
}