foundation icon indicating copy to clipboard operation
foundation copied to clipboard

String with Encoding

Open vincenthz opened this issue 8 years ago • 3 comments

Expose a String type with a slightly different name that doesn't assume one specific encoding.

data EString encoding = EString (UArray Word8)

where encoding are specific disjoint types like UTF8 or UTF16.

class Encoding e
   ...
data UTF8
instance Encoding UTF8
data UTF16
instance Encoding UTF16

Operation would look like:

break :: Encoding e => (Char -> Bool) -> EString e -> (EString e, EString e)

vincenthz avatar Aug 17 '16 11:08 vincenthz

This seems like a lot of complexity just for people using broken encodings...

ndmitchell avatar Aug 17 '16 12:08 ndmitchell

Foundation already provides an Encoding class with an associated type to the Unit encoding:

class Encoding encoding where
    -- | the unit element use for the encoding.
    -- i.e. Word8 for ASCII7 or UTF8, Word16 for UTF16...
    --
    type Unit encoding
    ...

What about using an algebraic type to make use of this associated type? Something like:

data EString encoding where
  EString :: Encoding encoding => UArray (EncodingUnit encoding)

NicolasDP avatar Aug 17 '16 19:08 NicolasDP

I don't think Unit encoding is very useful for the outside world, the IO system will very likely interact in term of UArray Word8.

I think the only usefulness would be to have typed string when dealing with the foreign world. For example being able to say:

foreign :: Ptr (EString UTF16) -> ...

Could be useful in the future. Also, this way you can cheaply tag any buffer that have a specific encoding that is not UTF8, without transforming the buffer (and still be able to do some textual operation since we can make it Sequential). I don't think I'll use this very much overall, but we have a flexible system where this can be done cheaply API wise (i.e. exposing 1 type)

vincenthz avatar Aug 17 '16 20:08 vincenthz