crystal
crystal copied to clipboard
Use simpler faster Rabin-Karp-like search for short needle
I see some overlap with the implementations of #index(Char)/#byte_index(Char) which both already handle bytesizes smaller than 4. I figure those use cases could be merged with these optimized algorithms?
This and #13819 could definitely share the same internal method, now that that PR is merged. The only caveat is that here the 4 bytes might represent more than 1 character
I doubd it is feasible to refactor "fast RK" into separate method:
- character position is counted in
index(String), and it is not needed to be counted inbyteindex(String) index(Char)already uses per-char iteration which doesn't use Rabin-Karp. Does it worth to change it to RK?- at most
byteindex(Char)andbyteindex(String)could be unified for short strings, I believe.
I suppose it's best to merge this as is then. We can try to refactor and deduplicate once it's in.
Do you have any benchmarks to indicate the performance impact of this change?
Why did you chose UInt32 for the short needle? I figure we could use UInt64 (or UInt128) for search strings with up to 8 (16) bytes? There should be no performance degradation at least for 64-bit arithmetics on 64-bit architectures.
Bigger search strings would need SIMD support though (ref #3057).
@straight-shoota good question about integer size.
If 32bit platforms are not first-class targets for Crystal, then I'd like to use UInt64 certainly.
UInt128 doubtfully would be good cause it is not native.
For benchmarks: I used to bench RK in C, and saw impact of double multiplication. But ok, I will do benchmark for this PR.
32-bit targets are certainly getting less important. This is an optimization and I expect even with a 64-bit hash performance would still be reasonable on 32-bit architectures. So I think there's no reason not to go for 64-bits.
I've pushed refactoring commit, and benchmarked.
I use UInt32 for chars and UInt64 for strings in "fast rabin-karp".
It is not fastest variant: using UInt32 for small strings and UInt64 for 5-8 charracter strings were faster, but then performance for 2-4 byte string and 5-8 byte string were the same. I couldn't explain it, so I just drop UInt32 for 2-4 byte strings.
Really, it was nightmare benchmarking: simple changes could affect result upto 16% without any meaningful reason :cry:
Benchmark results on Ryzen 7 5825U at 2GHz with this scripts
| was idx | now idx | diff idx | was bidx | now bidx | diff bidx | |
|---|---|---|---|---|---|---|
| small 1 | 0.068663 | 0.061373 | -10.62 | 0.071392 | 0.008054 | -88.72 |
| medium 1 | 0.503822 | 0.334602 | -33.59 | 0.545443 | 0.010590 | -98.06 |
| big 1 | 4.697361 | 2.865704 | -38.99 | 5.058999 | 0.027877 | -99.45 |
| small 2 | 0.069861 | 0.053189 | -23.86 | 0.073254 | 0.034132 | -53.41 |
| medium 2 | 0.509033 | 0.336184 | -33.96 | 0.548318 | 0.235357 | -57.08 |
| big 2 | 4.730817 | 3.117069 | -34.11 | 5.062775 | 2.049732 | -59.51 |
| small 4 | 0.077225 | 0.058108 | -24.75 | 0.079227 | 0.036833 | -53.51 |
| medium 4 | 0.510799 | 0.340637 | -33.31 | 0.554923 | 0.242836 | -56.24 |
| big 4 | 4.702089 | 3.096297 | -34.15 | 5.069510 | 2.052939 | -59.50 |
| small 6 | 0.078827 | 0.057426 | -27.15 | 0.080145 | 0.038325 | -52.18 |
| medium 6 | 0.511529 | 0.339824 | -33.57 | 0.559667 | 0.241253 | -56.89 |
| big 6 | 4.700362 | 3.093604 | -34.18 | 5.076374 | 2.053248 | -59.55 |
| small 10 | 0.088149 | 0.077928 | -11.60 | 0.084642 | 0.070408 | -16.82 |
| medium 10 | 0.517613 | 0.511492 | -1.18 | 0.568815 | 0.555601 | -2.32 |
| big 10 | 4.699969 | 4.695120 | -0.10 | 5.085927 | 5.074980 | -0.22 |
| small 14 | 0.089366 | 0.080371 | -10.07 | 0.089447 | 0.072904 | -18.49 |
| medium 14 | 0.523864 | 0.511200 | -2.42 | 0.575362 | 0.556926 | -3.20 |
| big 14 | 4.706731 | 4.701650 | -0.11 | 5.092217 | 5.071227 | -0.41 |
| small 18 | 0.095913 | 0.082964 | -13.50 | 0.098844 | 0.078447 | -20.64 |
| medium 18 | 0.549657 | 0.531382 | -3.32 | 0.585257 | 0.561600 | -4.04 |
| big 18 | 4.744925 | 4.725939 | -0.40 | 5.100282 | 5.076517 | -0.47 |
UInt128 didn't show difference with full Rabin-Karp, so I don't include it into final variant.
May I rebase and squash commits?
No, merge as usual
@HertzDevil it’s pity: a lot of small commits without much sense. Personally I prefer cleaner history, so I tend to rearrange commits after review before merge. This way it goes at work, and in FOSS projects which lives outside of github.
But, ok. I calm down.
Sorry for long delay.
Merged master into branch. Added commit with review fixes.