compile-time-regular-expressions icon indicating copy to clipboard operation
compile-time-regular-expressions copied to clipboard

[WIP] Accelerate string matching

Open Andersama opened this issue 4 years ago • 2 comments
trafficstars

Optimizes string matching by allowing memcmp like functionality (even on utf8 sequences)

comparison: https://compiler-explorer.com/z/Tz3KhG

Rough idea is that ordinarily per character we're doing two comparisons, one to check iterators and one to validate the character and then a branch. Presumably we can improve on this by reducing comparisons and jumps if we know enough about the iterator to work out the remaining characters left, then some trickery to do comparisons in as branchless manner as possible. Vaguely implements idea off of #147, just for string<> at the moment.

Appears clang and msvc pick up on this change and optimize with sse, presumably gcc might as well.

The fold expression responsible for validating the match can definitely be played around with. We could count the characters matched, count the differences, or like here accumulate bitwise differences in a register.

	return { current + bump, (count >= sizeof...(String)) && (((current[Idx] ^ static_cast<std::remove_cvref_t<char_type>>(String)) | ... | static_cast<char_type>(0)) == 0) };

Andersama avatar Dec 28 '20 15:12 Andersama

Not bench-marked things, but been playing around since the tests passed. It seems like combining == and & operators generates the same assembly that clang uses for std::memcmp.

Andersama avatar Dec 30 '20 10:12 Andersama

I'm getting pretty significant improvements to matching / searching from this. When using == and & operators for example against the pattern 0123456789ABCDEF turns into a 16 byte comparison sse instruction. https://compiler-explorer.com/z/1bzKKd

match_str(char*, char*):                       # @match_str(char*, char*)
        mov     rax, rsi
        sub     rax, rdi
        cmp     rax, 16
        jb      .LBB0_1
        movdqu  xmm0, xmmword ptr [rdi]
        pcmpeqb xmm0, xmmword ptr [rip + .LCPI0_0]
        pmovmskb        eax, xmm0
        cmp     ax, -1
        sete    cl
        add     rdi, 16
        cmp     rdi, rsi
        sete    al
        and     al, cl
        ret
.LBB0_1:
        xor     eax, eax
        ret

vs...

        cmp     rdi, rsi
        je      .LBB0_33
        lea     rcx, [rdi + 1]
        xor     eax, eax
        cmp     rcx, rsi
        je      .LBB0_32
        cmp     byte ptr [rdi], 48
        jne     .LBB0_32
        lea     rcx, [rdi + 2]
        xor     eax, eax
        cmp     rcx, rsi
        je      .LBB0_32
        cmp     byte ptr [rdi + 1], 49
        jne     .LBB0_32
        lea     rcx, [rdi + 3]
        xor     eax, eax
        cmp     rcx, rsi
        je      .LBB0_32
        cmp     byte ptr [rdi + 2], 50
        jne     .LBB0_32
        lea     rcx, [rdi + 4]
        xor     eax, eax
        cmp     rcx, rsi
        je      .LBB0_32
        cmp     byte ptr [rdi + 3], 51
        jne     .LBB0_32
        
        #... (this for a quite a while)
        
        cmp     byte ptr [rdi + 15], 70
        sete    cl
        add     rdi, 16
        cmp     rdi, rsi
        sete    al
        and     al, cl
        jmp     .LBB0_32
.LBB0_33:
        xor     eax, eax
.LBB0_32:
        ret

Andersama avatar Dec 31 '20 03:12 Andersama