KeyDB icon indicating copy to clipboard operation
KeyDB copied to clipboard

Optimize rdbEncodeInteger using bit operations

Open tryfinally opened this issue 5 years ago • 4 comments

use sign extension mov instructions into 8/16/32 bits int registers to check if value fits into 8/16/32 bit width.

// return number of bytes needed to encode (1,2,4 or 0 if above)
int canEncodeInBytes(long long value){ 
    struct SignExtendBits{
        long long bits8: 8; 
        long long bits16: 16;
        long long bits32: 32;|
    } v;           
    v.bits8 = value;
    if ( v.bits8 == value ) return 1; 
    v.bits16 = value; 
    if ( v.bits16 == value ) return 2; 
    v.bits32 = value; 
    if ( v.bits32 == value ) return 4; 
    return 0; 
} 

Compiles to compact code even with -O1 :

function(long long):                           # @function(long long)
        movsx   rcx, dil
        mov     eax, 1
        cmp     rcx, rdi
        je      .LBB8_3
        movsx   rcx, di
        mov     eax, 2
        cmp     rcx, rdi
        je      .LBB8_3
        movsxd  rcx, edi
        xor     eax, eax
        cmp     rcx, rdi
        sete    al
        shl     eax, 2
.LBB8_3:
        ret

tryfinally avatar Jun 13 '20 13:06 tryfinally

Hi @tryfinally do you have an example that shows the perf improvement?

JohnSully avatar Jun 13 '20 14:06 JohnSully

Nope, but look on the assembly. I know these are micro optimizations. It would be good to know if you are not welcoming those. I would agree that macro optimizations needs per data.

tryfinally avatar Jun 13 '20 15:06 tryfinally

OK let me do some save/load testing to see the impact.

In general optimization only fixes should have some scenario or perf data to justify them. Sometimes perf results can be counterintuitive so its important to keep it data driven.

That said I don’t want to discourage tinkering or contributions, so this is not going to be a super high bar.

JohnSully avatar Jun 13 '20 16:06 JohnSully

63 ^ __builtin_clzll(value) is a better fit there. it's branchless and cross platform. it returns the msb index (floor of log2) which you can divide by 8 to get byte width.

romange avatar Sep 30 '20 09:09 romange