stringi icon indicating copy to clipboard operation
stringi copied to clipboard

match, pmatch

Open gagolews opened this issue 10 years ago • 4 comments

gagolews avatar Oct 25 '13 14:10 gagolews

Current version of stri_in_fixed (with boost::unordered_map):

Unit: microseconds
                            expr        min          lq      median          uq        max neval
               match(x100, x100)     10.080     14.2310     37.0440     45.2725    121.462   100
             match(x1000, x1000)     67.136     77.4465    106.8275    118.6175    252.076   100
           match(x10000, x10000)    702.594    812.8285    870.2805    905.0895  21793.674   100
         match(x100000, x100000)  13429.503  13881.8395  14623.7190  33414.6765 136544.595   100
       stri_in_fixed(x100, x100)     46.309     76.5255    103.8125    130.0185    194.928   100
     stri_in_fixed(x1000, x1000)    411.812    461.3915    534.5720    610.9745    919.369   100
   stri_in_fixed(x10000, x10000)   4991.309   5301.7590   5434.6940   5600.4165   8130.862   100
 stri_in_fixed(x100000, x100000)  94513.115  96250.2445  97535.2510  99020.6875 119841.516   100

Unsatisfying. DOT.

gagolews avatar Jun 06 '14 18:06 gagolews

R's match() calls do_match5. It uses a R internal string hashtable directly. So I doubt whether we can get any faster that it. Should stri_in_fixed then be implemented as match(stri_enc_toutf8(x), stri_enc_toutf8(x)) ?

gagolews avatar Jun 06 '14 18:06 gagolews

to be done: stri_is_coll + pmatch + %in%?

gagolews avatar Jun 06 '14 18:06 gagolews

also for sorted haystacks (bin search/...)

gagolews avatar Mar 27 '16 13:03 gagolews