stringi
stringi copied to clipboard
match, pmatch
Current version of stri_in_fixed
(with boost::unordered_map
):
Unit: microseconds
expr min lq median uq max neval
match(x100, x100) 10.080 14.2310 37.0440 45.2725 121.462 100
match(x1000, x1000) 67.136 77.4465 106.8275 118.6175 252.076 100
match(x10000, x10000) 702.594 812.8285 870.2805 905.0895 21793.674 100
match(x100000, x100000) 13429.503 13881.8395 14623.7190 33414.6765 136544.595 100
stri_in_fixed(x100, x100) 46.309 76.5255 103.8125 130.0185 194.928 100
stri_in_fixed(x1000, x1000) 411.812 461.3915 534.5720 610.9745 919.369 100
stri_in_fixed(x10000, x10000) 4991.309 5301.7590 5434.6940 5600.4165 8130.862 100
stri_in_fixed(x100000, x100000) 94513.115 96250.2445 97535.2510 99020.6875 119841.516 100
Unsatisfying. DOT.
R's match()
calls do_match5
. It uses a R internal string hashtable directly. So I doubt whether we can get any faster that it. Should stri_in_fixed
then be implemented as
match(stri_enc_toutf8(x), stri_enc_toutf8(x))
?
to be done: stri_is_coll
+ pmatch
+ %in%
?
also for sorted haystacks (bin search/...)