titan
titan copied to clipboard
New Feature Add! Delta compression ratio can reach up to 77.88x!
Description
Titan now can use delta compression. Here is my code repository Acording to the test result, the compression ratio for compressed record can reach up to 77.88x! However the database disk size shrink ratio is not so big. You can see my test result below.
Delta Compression procedure
- Every call for Put will genertate a feature of the record by Odess similarity detection method.
- The feature of the record will stored in the feature index table. Every column family will have a table.
- In gc, every valid record will be searched for similar record by feature.
- Once foud similar record in the table, they will be compressed into a record + multi deltas
Question
I wanna test the impact of the delta compression for Titan. But I see 2 tools for testing:
Here is my question:
- If I use the scipt in the /tools. There is a lot of work jobs, which should I choose?
- If I use go-ycsb, is there any parameter that I can use to compare with the result in this ariticle?
Test result
Here is the sumary result of titan_delta_compression_test
Enron Email
517401 records have been put into titan databse!
1.40GB(1420666341) are the size of keys and values
59113 (11.42%) is the number of similar records that can be delta compressed
method | compress fail | compress success | delta size | delta after size | delta compress ratio | compress time |
---|---|---|---|---|---|---|
kGDelta | 0 | 97799 | 978.90MB | 17.10MB | 57.48 | 1.05s |
kXDelta | 0 | 97799 | 978.90MB | 12.60MB | 77.88 | 11.60s |
kEDelta | 0 | 97799 | 978.90MB | 25.30MB | 38.83 | 1.40s |
method | database size | database after size | database compress ratio | blob files size | blob files after size | blob file compress ratio |
---|---|---|---|---|---|---|
kGDelta | 1.20GB | 974.50MB | 1.17 | 386.90MB | 149.00MB | 2.60 |
kXDelta | 1.20GB | 974.50MB | 1.17 | 386.90MB | 149.00MB | 2.60 |
kEDelta | 1.20GB | 974.50MB | 1.17 | 386.90MB | 149.00MB | 2.60 |
Wikipedia
1367732 records have been put into titan databse!
19.10GB(20402694776) are the size of keys and values
731224 (53.46%) is the number of similar records that can be delta compressed
method | compress fail | compress success | delta size | delta after size | delta compress ratio | compress time |
---|---|---|---|---|---|---|
kGDelta | 16 | 729411 | 9.70GB | 1.50GB | 6.61 | 69.51s |
kXDelta | 0 | 729427 | 9.70GB | 741.70MB | 13.34 | 360.59s |
kEDelta | 19 | 729408 | 9.70GB | 2.20GB | 4.58 | 81.94s |
method | database size | database after size | database compress ratio | blob files size | blob files after size | blob file compress ratio |
---|---|---|---|---|---|---|
kGDelta | 7.80GB | 7.30GB | 1.08 | 7.50GB | 6.80GB | 1.10 |
kXDelta | 7.80GB | 7.30GB | 1.08 | 7.50GB | 6.80GB | 1.10 |
kEDelta | 7.80GB | 7.30GB | 1.08 | 7.50GB | 6.80GB | 1.10 |
Good job, what's the meaning for each method
So you develop a new delta compression algorithm, right? I'm curious about the overhead
For the question, you can just use db_bench
in /tools and choose the workload on your demand
No,I just use the state of art delta comprassion Gdelta in the Titan, and compared to Xdelta and Edelta.
------------------ 原始邮件 ------------------ 发件人: "tikv/titan" @.>; 发送时间: 2022年5月30日(星期一) 晚上6:43 @.>; 抄送: "Wang Haitao @.@.>; 主题: Re: [tikv/titan] New Feature Add! Delta compression ratio can reach up to 77.88x! (Issue #245)
So you develop a new delta compression algorithm, right? I'm curious about the overhead
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
@apple-ouyang Do you have any other measurements like the impact one cpu load, disk io etc... ?
Also how does it compare to zstd ?