case-insensitive
case-insensitive copied to clipboard
CI ByteString is slow
Constructing a CI ByteString will ask for pinned memory, but usually the ByteString is short so this behavior not only add overhead but contribute to heap fragment. I think we can do better here, any idea?
Since we have to construct a new ByteString to foldCase the original we can't avoid asking for pinned memory.
What we could do is add an instance FoldCase ShortByteString. Care to write PR?
OK, I'll send one. please reopen to track this.
BTW, what's the purpose of this rewrite rule?
{-# RULES "foldCase/ByteString" foldCase = foldCaseBS #-}
For some reason that RULE made the benchmark faster.
What if we implemented CI using a type family? then we can keep original ByteString slice and do a more efficient copy to FoldedCase ByteString. I think this is the best option but it has some compatibility issue. What do you think?
type family FoldedCase a where
FoldedCase B.ByteString = Short.ShortByteString
FoldedCase BL.ByteString = [Short.ShortByteString]
FoldedCase T.Text = T.Text
FoldedCase TL.Text = TL.Text
data CI s = CI { original :: !s -- ^ Retrieve the original string-like value.
, foldedCase :: !(FoldedCase s) -- ^ Retrieve the case folded string-like value.
-- (Also see 'foldCase').
}
Another reason i propose this solution is that the document of ShortByteString says It is suitable for use as an internal representation for code that needs to keep many short strings in memory, but it should not be used as an interchange type..
Another approach is to provide a Data.CaseInsensitive.ByteString module which exports a specialized CIByteString type using ShortByteString internally. So with providing ShortByteString instance we have three options here.