bytestring
bytestring copied to clipboard
elemIndexEnd non-optimally implemented in terms of findIndexEnd
The current implementations of the elemIndexEnd
function in modules Data.ByteString
and Data.ByteString.Lazy
are uniformly defined as findIndexEnd . (==)
(the Char8
version merely invokes the strict bytestring definition after converting from Char
to Word8
).
This seems counterintuitive when compared with elemIndex
, which is more optimized than findIndex
through the use of the memchr
C FFI call to avoid costly byte-by-byte predicate testing.
There exists a GNU extension for string.h
that defines an operation memrchr
that performs a similar operation to memchr
but returns the final occurrence of a byte rather than the first, which could be used when available. Even without such platform-specific optimizations, it should still be possible to either add a memrchr
-like function to the cbits code.
Even without FFI calls, elemIndexEnd
could use the same logic as findIndexEnd
and perform a byte-by-byte direct equality test at least as efficiently as findIndexEnd . (==)
, but without the indirection.
The implementation of elemIndexEnd
was recently discussed here: https://github.com/haskell/bytestring/pull/155#issuecomment-628619601 It seems that thanks to inlining there is no performance penalty.
AFAIU Windows systems do not provide memrchr
, so the only option to remain cross-platform is to implement memrchr
in cbits
. I do not mind against such approach, if supported by benchmarks.