case-insensitive icon indicating copy to clipboard operation
case-insensitive copied to clipboard

CI ByteString is slow

Open winterland1989 opened this issue 9 years ago • 5 comments

Constructing a CI ByteString will ask for pinned memory, but usually the ByteString is short so this behavior not only add overhead but contribute to heap fragment. I think we can do better here, any idea?

winterland1989 avatar Oct 21 '16 03:10 winterland1989

Since we have to construct a new ByteString to foldCase the original we can't avoid asking for pinned memory.

What we could do is add an instance FoldCase ShortByteString. Care to write PR?

basvandijk avatar Oct 21 '16 06:10 basvandijk

OK, I'll send one. please reopen to track this.

BTW, what's the purpose of this rewrite rule?

{-# RULES "foldCase/ByteString" foldCase = foldCaseBS #-}

winterland1989 avatar Oct 21 '16 07:10 winterland1989

For some reason that RULE made the benchmark faster.

basvandijk avatar Oct 21 '16 14:10 basvandijk

What if we implemented CI using a type family? then we can keep original ByteString slice and do a more efficient copy to FoldedCase ByteString. I think this is the best option but it has some compatibility issue. What do you think?

type family FoldedCase a where
    FoldedCase B.ByteString = Short.ShortByteString
    FoldedCase BL.ByteString = [Short.ShortByteString]
    FoldedCase T.Text = T.Text
    FoldedCase TL.Text = TL.Text

data CI s = CI { original   :: !s -- ^ Retrieve the original string-like value.
               , foldedCase :: !(FoldedCase s) -- ^ Retrieve the case folded string-like value.
                                  --   (Also see 'foldCase').
               }

Another reason i propose this solution is that the document of ShortByteString says It is suitable for use as an internal representation for code that needs to keep many short strings in memory, but it should not be used as an interchange type..

winterland1989 avatar Oct 24 '16 04:10 winterland1989

Another approach is to provide a Data.CaseInsensitive.ByteString module which exports a specialized CIByteString type using ShortByteString internally. So with providing ShortByteString instance we have three options here.

winterland1989 avatar Oct 24 '16 05:10 winterland1989