compile-time-regular-expressions
compile-time-regular-expressions copied to clipboard
[WIP] Accelerate string matching
Optimizes string matching by allowing memcmp like functionality (even on utf8 sequences)
comparison: https://compiler-explorer.com/z/Tz3KhG
Rough idea is that ordinarily per character we're doing two comparisons, one to check iterators and one to validate the character and then a branch. Presumably we can improve on this by reducing comparisons and jumps if we know enough about the iterator to work out the remaining characters left, then some trickery to do comparisons in as branchless manner as possible. Vaguely implements idea off of #147, just for string<> at the moment.
Appears clang and msvc pick up on this change and optimize with sse, presumably gcc might as well.
The fold expression responsible for validating the match can definitely be played around with. We could count the characters matched, count the differences, or like here accumulate bitwise differences in a register.
return { current + bump, (count >= sizeof...(String)) && (((current[Idx] ^ static_cast<std::remove_cvref_t<char_type>>(String)) | ... | static_cast<char_type>(0)) == 0) };
Not bench-marked things, but been playing around since the tests passed. It seems like combining == and & operators generates the same assembly that clang uses for std::memcmp.
I'm getting pretty significant improvements to matching / searching from this. When using == and & operators for example against the pattern 0123456789ABCDEF turns into a 16 byte comparison sse instruction.
https://compiler-explorer.com/z/1bzKKd
match_str(char*, char*): # @match_str(char*, char*)
mov rax, rsi
sub rax, rdi
cmp rax, 16
jb .LBB0_1
movdqu xmm0, xmmword ptr [rdi]
pcmpeqb xmm0, xmmword ptr [rip + .LCPI0_0]
pmovmskb eax, xmm0
cmp ax, -1
sete cl
add rdi, 16
cmp rdi, rsi
sete al
and al, cl
ret
.LBB0_1:
xor eax, eax
ret
vs...
cmp rdi, rsi
je .LBB0_33
lea rcx, [rdi + 1]
xor eax, eax
cmp rcx, rsi
je .LBB0_32
cmp byte ptr [rdi], 48
jne .LBB0_32
lea rcx, [rdi + 2]
xor eax, eax
cmp rcx, rsi
je .LBB0_32
cmp byte ptr [rdi + 1], 49
jne .LBB0_32
lea rcx, [rdi + 3]
xor eax, eax
cmp rcx, rsi
je .LBB0_32
cmp byte ptr [rdi + 2], 50
jne .LBB0_32
lea rcx, [rdi + 4]
xor eax, eax
cmp rcx, rsi
je .LBB0_32
cmp byte ptr [rdi + 3], 51
jne .LBB0_32
#... (this for a quite a while)
cmp byte ptr [rdi + 15], 70
sete cl
add rdi, 16
cmp rdi, rsi
sete al
and al, cl
jmp .LBB0_32
.LBB0_33:
xor eax, eax
.LBB0_32:
ret