Potential approach to avoiding conversion boilerplate
I was recently considering approaches for avoiding the need to have conversions to / from foundation types. An idea occurred to me that you could use alternate versions of bytestring / text / etc, which provide a nearly compatible API while using foundation types. One trickiness for things like bytestring is handling the internal constructor. However, this can be worked around to some extent:
diff --git a/Data/ByteString/Internal.hs b/Data/ByteString/Internal.hs
index 031403e..c73cfc6 100644
--- a/Data/ByteString/Internal.hs
+++ b/Data/ByteString/Internal.hs
@@ -1,6 +1,7 @@
{-# LANGUAGE CPP, ForeignFunctionInterface, BangPatterns #-}
{-# LANGUAGE UnliftedFFITypes, MagicHash,
UnboxedTuples, DeriveDataTypeable #-}
+{-# LANGUAGE PatternSynonyms, ViewPatterns #-}
#if __GLASGOW_HASKELL__ >= 703
{-# LANGUAGE Unsafe #-}
#endif
@@ -26,7 +27,8 @@
module Data.ByteString.Internal (
-- * The @ByteString@ type and representation
- ByteString(..), -- instances: Eq, Ord, Show, Read, Data, Typeable
+ ByteString, -- instances: Eq, Ord, Show, Read, Data, Typeable
+ pattern PS,
-- * Conversion with lists: packing and unpacking
packBytes, packUptoLenBytes, unsafePackLenBytes,
@@ -133,6 +135,9 @@ import GHC.Ptr (Ptr(..), castPtr)
-- CFILES stuff is Hugs only
{-# CFILES cbits/fpstring.c #-}
+import qualified Foundation as F
+import qualified Foundation.Array.Internal as F
+
-- -----------------------------------------------------------------------------
-- | A space-efficient representation of a 'Word8' vector, supporting many
@@ -141,35 +146,25 @@ import GHC.Ptr (Ptr(..), castPtr)
-- A 'ByteString' contains 8-bit bytes, or by using the operations from
-- "Data.ByteString.Char8" it can be interpreted as containing 8-bit
-- characters.
---
-data ByteString = PS {-# UNPACK #-} !(ForeignPtr Word8) -- payload
- {-# UNPACK #-} !Int -- offset
- {-# UNPACK #-} !Int -- length
- deriving (Typeable)
-
-instance Eq ByteString where
- (==) = eq
+type ByteString = F.UArray Word8
-instance Ord ByteString where
- compare = compareBytes
-
-#if MIN_VERSION_base(4,9,0)
-instance Semigroup ByteString where
- (<>) = append
-#endif
+toBS :: F.UArray Word8 -> (ForeignPtr Word8, Int, Int)
+toBS v = unsafeDupablePerformIO $ do
+ let !(F.CountOf len) = F.length v
+ fp <- mallocByteString len
+ withForeignPtr fp $ \dst -> F.withPtr v $ \src -> memcpy dst src len
+ return $! (fp, 0, len)
-instance Monoid ByteString where
- mempty = PS nullForeignPtr 0 0
-#if MIN_VERSION_base(4,9,0)
- mappend = (<>)
-#else
- mappend = append
-#endif
- mconcat = concat
+pattern PS fptr off len <- (toBS -> (fptr, off, len)) where
+ PS fptr off len = F.fromForeignPtr (fptr, off, len)
+{-# COMPLETE PS #-}
+{- FIXME(mgsloan): will need a foundation-orphans package for stuff like this.
instance NFData ByteString where
rnf PS{} = ()
+-}
+{- FIXME(mgsloan): Can't be a drop-in replacement for these
instance Show ByteString where
showsPrec p ps r = showsPrec p (unpackChars ps) r
@@ -178,12 +173,7 @@ instance Read ByteString where
instance IsString ByteString where
fromString = packChars
-
-instance Data ByteString where
- gfoldl f z txt = z packBytes `f` unpackBytes txt
- toConstr _ = error "Data.ByteString.ByteString.toConstr"
- gunfold _ _ = error "Data.ByteString.ByteString.gunfold"
- dataTypeOf _ = mkNoRepType "Data.ByteString.ByteString"
+-}
------------------------------------------------------------------------
-- Packing and unpacking from lists
diff --git a/Data/ByteString/Short/Internal.hs b/Data/ByteString/Short/Internal.hs
index dbde958..5bfac96 100644
--- a/Data/ByteString/Short/Internal.hs
+++ b/Data/ByteString/Short/Internal.hs
@@ -1,6 +1,6 @@
{-# LANGUAGE DeriveDataTypeable, CPP, BangPatterns, RankNTypes,
ForeignFunctionInterface, MagicHash, UnboxedTuples,
- UnliftedFFITypes #-}
+ UnliftedFFITypes, PatternSynonyms #-}
{-# OPTIONS_GHC -fno-warn-name-shadowing #-}
#if __GLASGOW_HASKELL__ >= 703
{-# LANGUAGE Unsafe #-}
@@ -45,7 +45,7 @@ module Data.ByteString.Short.Internal (
useAsCStringLen
) where
-import Data.ByteString.Internal (ByteString(..), accursedUnutterablePerformIO, c_strlen)
+import Data.ByteString.Internal (ByteString, pattern PS, accursedUnutterablePerformIO, c_strlen)
import Data.Typeable (Typeable)
import Data.Data (Data(..), mkNoRepType)
@@ -198,7 +198,7 @@ length (SBS _ len) = len
null :: ShortByteString -> Bool
null sbs = length sbs == 0
--- | /O(1)/ 'ShortByteString' index (subscript) operator, starting from 0.
+-- | /O(1)/ 'ShortByteString' index (subscript) operator, starting from 0.
index :: ShortByteString -> Int -> Word8
index sbs i
| i >= 0 && i < length sbs = unsafeIndex sbs i
diff --git a/bytestring.cabal b/bytestring.cabal
index 56494a0..623fd12 100644
--- a/bytestring.cabal
+++ b/bytestring.cabal
@@ -68,7 +68,7 @@ flag integer-simple
default: False
library
- build-depends: base >= 4.2 && < 5, ghc-prim, deepseq
+ build-depends: base >= 4.2 && < 5, ghc-prim, deepseq, foundation
exposed-modules: Data.ByteString
Data.ByteString.Char8
The patch above does allow the whole bytestring package to compile using foundation's UArray Word8 type. I can do:
> Foundation.toList (Data.ByteString.singleton 42)
[42]
I ran into the following issues:
-
ByteString has an NFData instance, so need an NFData instance for UArray. Will need to have a package of orphan instances for foundation types
-
ByteString's Show instance has different behavior from
UArray Word8, so it can't be a drop-in replacement. -
Similarly, we don't want
UArray Word8to have anIsStringinstance like ByteString has -
Using pattern synonyms for ByteString's internal rep almost works perfectly, except if it is imported explicitly. In other words,
import Data.ByteString.Internal (ByteString(..))doesn't import thePSpattern. This seems to be a rather unfortunate limitation of the pattern synonyms extension.
Most of these seem either resolvable or acceptable behavior differences. The IsString, Show, Read instances for ByteString are worth ditching anyway.
So, where to go from here? Well, if people think this is a good idea, then it would make sense to make a larger change to ByteString / Text, that implements the operations in terms of foundation operations, to avoid the conversion overhead. It's actually possible that in practice the conversion overhead could be greatly reduced via https://github.com/haskell-foundation/foundation/issues/452 and GHC's overall cleverness. Would require benchmarking
For NFData and Hashable I think the pragmatic thing is to just have foundation itself depend on those packages, just for the type classes. It vastly eases integration even though I appreciate it isn't quite "the foundation" approach.
I do like the idea of having a foundation version of bytestring and text - given that foundation is usually faster (in my benchmarks at least) having an easy "try it and see" option would be an easy way in for new users.
@mgsloan that's a very interesting approach, given what it offer. It's probably ok given that bytestring, text and other core packages API are so frozen, but it could be a bit of a maintenance overhead to do that. Would maintaining this in this own repo, maybe one repo for all the packages would be a good way ?
For instances, I don't see a good way to sort this, apart from having a newtype wrapper over UArray. We could centralize the orphan instances in edge though and have a single generic instance like:
instance Basement.NormalForm a => DeepSeq.NFData a where
...
(this could create a problem for things that want to be NFData and NormalForm though)
@ndmitchell until those core packages are properly maintained (e.g. making alpha compatible releases), there's just no way that's going to happen. foundation is compatible with 8.4.1-alpha out of the box thanks to this principled position. I think we're all in agreement that it's definitely not the simplest path to walk and that it lead to difficulty in term of right-now capabilities.
@vincenthz "principled position" - I don't think you can take the moral high ground merely by avoiding abject stupidity 😉
There's no moral high grounds here; it's mainly a technical reason at the center point to make stuff that works much better together, the rest is just gravy.