Tendis icon indicating copy to clipboard operation
Tendis copied to clipboard

A proposal for Improving Key Format

Open rockeet opened this issue 5 years ago • 2 comments

Current Key Format is SlotID | Type | DBID | PK | 0 | Version | SK | PK_LEN | Reserved, the PK_LEN in suffix will harm some compression algo such as terarkdb's NestLoudsTrie.

A better solution is to escape one '\0' byte in PK into '\0\0' ( 2 '\0' bytes), and let Version Never be 0, thus PK can be identified unambiguously and PK_LEN is not needed. An additional advantage is the key size will be reduced in most cases because '\0\0' in PK is very unlikey in Redis-like DB(which Tendis aimed on).

rockeet avatar Jan 08 '21 09:01 rockeet

Thanks for suggestions.

If PK_LEN is removed, how much improvement can the compression rate be?

TendisDev avatar Jan 08 '21 09:01 TendisDev

terarkdb's NestLoudsTrie compress suffix by the same algo as prefix, if there are different bytes on the suffix tail, the algo will be greatly depressed. The real compression rate is depend on the data set, generally 5x~10x.

With the proposed solution, even if users do not use terarkdb, because '\0' is very rare in PK, in most cases, the encoded key len will be reduce by 1 byte.

rockeet avatar Jan 08 '21 10:01 rockeet