blog icon indicating copy to clipboard operation
blog copied to clipboard

snappy压缩

Open ma6174 opened this issue 8 years ago • 1 comments

近期测试了一下snappy压缩算法,总体感觉是压缩、解压速度非常快,应用场景也很多。

官网: https://code.google.com/p/snappy/

简介

Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.

基本意思就是说snappy压缩算法不追求压缩比,而是追求压缩和解压速度。在i7 64位CPU上压缩速度在250MB/s,解压速度达到500MB/s

snappy VS gzip

  • 测试环境:
    • MacBook Pro (Retina, 13-inch, Mid 2014)
    • 2.6 GHz Intel Core i5
    • 8 GB 1600 MHz DDR3
  • 测试文件(不同文件类型可能结果稍有偏差):
    • VirtualBox ubuntu vdi 镜像文件,大小 3.8G
  • 测试工具:
    • gzip: Apple gzip 242
    • snappy: snappy
测试结果:
压缩速度 解压速度 压缩比
snappy 146.6M/s 257.7M/s 54.8%
gzip 23.8M/s 233.1M/s 69.2%
详细测试中间数据
  • 原始文件大小统计:
$ time cat ubuntu_ele.vdi | wc -c
 4062183424
cat ubuntu_ele.vdi  0.08s user 3.61s system 15% cpu 23.522 total
wc -c  21.20s user 1.00s system 94% cpu 23.521 total
  • 压缩对比:
$ time cat ubuntu_ele.vdi | snappy | wc -c
 1837598172
cat ubuntu_ele.vdi  0.09s user 3.18s system 12% cpu 27.215 total
snappy  25.45s user 0.98s system 97% cpu 27.217 total
wc -c  13.59s user 0.91s system 53% cpu 27.216 total

$ time cat ubuntu_ele.vdi | gzip | wc -c
 1252402891
cat ubuntu_ele.vdi  0.08s user 2.63s system 1% cpu 2:44.34 total
gzip  161.52s user 1.11s system 98% cpu 2:44.34 total
wc -c  12.66s user 0.30s system 7% cpu 2:44.34 total
  • 解压对比:
$ time cat ubuntu_ele.vdi | snappy | snappy -d | wc -c
 4062183424
cat ubuntu_ele.vdi  0.09s user 3.12s system 9% cpu 33.553 total
snappy  28.39s user 1.31s system 88% cpu 33.552 total
snappy -d  13.36s user 1.67s system 44% cpu 33.552 total
wc -c  24.09s user 1.03s system 74% cpu 33.553 total

$ time cat ubuntu_ele.vdi | gzip | gzip -d | wc -c
 4062183424
cat ubuntu_ele.vdi  0.08s user 2.70s system 1% cpu 2:44.92 total
gzip  161.62s user 1.15s system 98% cpu 2:44.92 total
gzip -d  15.63s user 0.99s system 10% cpu 2:44.92 total
wc -c  24.36s user 0.71s system 15% cpu 2:44.92 total

应用场景

这些都在用snappy进行压缩了:

MongoDB
Cassandra
Couchbase
Hadoop
LessFS
LevelDB (which is in turn used by Google Chrome)
Rocksdb
Lucene
VoltDB

在Google内部也大量被使用

Snappy is widely used inside Google, in everything from BigTable and MapReduce to our internal RPC systems. (Snappy has previously been referred to as “Zippy” in some presentations and the likes.)

总结

如果对压缩解压速度要求比较高,并且压缩比一般可以接受的话,snappy是一种比较好的选择。永久存储(日志等)或者实时传输(rpc等)都是比较好的使用场景。

为了方便使用,我写了一个类似gzip的一个工具snappy,可以随时对文件进行压缩,也可以通过管道对流实时压缩传输,项目主页: https://github.com/ma6174/snappy

ma6174 avatar Aug 01 '15 13:08 ma6174

$ cat /Users/lidaobing/go/bin/snappy | snappy > snappy.snappy
$ ls -l `which snappy` snappy.snappy
-rwxr-xr-x  1 lidaobing  360BUYAD\Domain Users  2818336 May 20 17:12 /Users/lidaobing/go/bin/snappy
-rw-r--r--  1 lidaobing  360BUYAD\Domain Users  1843684 May 20 17:12 snappy.snappy

lidaobing avatar May 20 '19 09:05 lidaobing