titan icon indicating copy to clipboard operation
titan copied to clipboard

New Feature Add! Delta compression ratio can reach up to 77.88x!

Open apple-ouyang opened this issue 2 years ago • 5 comments

Description

Titan now can use delta compression. Here is my code repository Acording to the test result, the compression ratio for compressed record can reach up to 77.88x! However the database disk size shrink ratio is not so big. You can see my test result below.

Delta Compression procedure

  1. Every call for Put will genertate a feature of the record by Odess similarity detection method.
  2. The feature of the record will stored in the feature index table. Every column family will have a table.
  3. In gc, every valid record will be searched for similar record by feature.
  4. Once foud similar record in the table, they will be compressed into a record + multi deltas

Question

I wanna test the impact of the delta compression for Titan. But I see 2 tools for testing:

  1. the script in the /tools
  2. go-ycsb used in this ariticle

Here is my question:

  1. If I use the scipt in the /tools. There is a lot of work jobs, which should I choose?
  2. If I use go-ycsb, is there any parameter that I can use to compare with the result in this ariticle?

Test result

Here is the sumary result of titan_delta_compression_test

Enron Email

517401 records have been put into titan databse!

1.40GB(1420666341) are the size of keys and values

59113 (11.42%) is the number of similar records that can be delta compressed

method compress fail compress success delta size delta after size delta compress ratio compress time
kGDelta 0 97799 978.90MB 17.10MB 57.48 1.05s
kXDelta 0 97799 978.90MB 12.60MB 77.88 11.60s
kEDelta 0 97799 978.90MB 25.30MB 38.83 1.40s
method database size database after size database compress ratio blob files size blob files after size blob file compress ratio
kGDelta 1.20GB 974.50MB 1.17 386.90MB 149.00MB 2.60
kXDelta 1.20GB 974.50MB 1.17 386.90MB 149.00MB 2.60
kEDelta 1.20GB 974.50MB 1.17 386.90MB 149.00MB 2.60

Wikipedia

1367732 records have been put into titan databse!

19.10GB(20402694776) are the size of keys and values

731224 (53.46%) is the number of similar records that can be delta compressed

method compress fail compress success delta size delta after size delta compress ratio compress time
kGDelta 16 729411 9.70GB 1.50GB 6.61 69.51s
kXDelta 0 729427 9.70GB 741.70MB 13.34 360.59s
kEDelta 19 729408 9.70GB 2.20GB 4.58 81.94s
method database size database after size database compress ratio blob files size blob files after size blob file compress ratio
kGDelta 7.80GB 7.30GB 1.08 7.50GB 6.80GB 1.10
kXDelta 7.80GB 7.30GB 1.08 7.50GB 6.80GB 1.10
kEDelta 7.80GB 7.30GB 1.08 7.50GB 6.80GB 1.10

apple-ouyang avatar May 10 '22 04:05 apple-ouyang

Good job, what's the meaning for each method

Connor1996 avatar May 30 '22 10:05 Connor1996

So you develop a new delta compression algorithm, right? I'm curious about the overhead

Connor1996 avatar May 30 '22 10:05 Connor1996

For the question, you can just use db_bench in /tools and choose the workload on your demand

Connor1996 avatar May 30 '22 10:05 Connor1996

No,I just use the state of art delta comprassion Gdelta in the Titan, and compared to Xdelta and Edelta.

------------------ 原始邮件 ------------------ 发件人: "tikv/titan" @.>; 发送时间: 2022年5月30日(星期一) 晚上6:43 @.>; 抄送: "Wang Haitao @.@.>; 主题: Re: [tikv/titan] New Feature Add! Delta compression ratio can reach up to 77.88x! (Issue #245)

So you develop a new delta compression algorithm, right? I'm curious about the overhead

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

apple-ouyang avatar Jun 03 '22 03:06 apple-ouyang

@apple-ouyang Do you have any other measurements like the impact one cpu load, disk io etc... ?

Also how does it compare to zstd ?

cscetbon avatar Mar 01 '23 04:03 cscetbon