cborg
cborg copied to clipboard
Typeclass instances & packaging
(@dcoutts asked me to move this to the issue tracker for wider discussion)
Currently, lib:binary-serialise-cbor depends on
├─ binary-serialise-cbor-0.1.1.0
│ ├─ array-0.5.1.1 ┄┄
│ ├─ base-4.9.1.0 ┄┄
│ ├─ bytestring-0.10.8.1 ┄┄
│ ├─ containers-0.5.7.1 ┄┄
│ ├─ ghc-prim-0.5.0.0 ┄┄
│ ├─ half-0.2.2.3
│ │ └─ base-4.9.1.0 ┄┄
│ ├─ hashable-1.2.6.1 ┄┄
│ ├─ integer-gmp-1.0.0.1 ┄┄
│ ├─ old-locale-1.0.0.7 ┄┄
│ ├─ primitive-0.6.2.0 ┄┄
│ ├─ text-1.2.2.2 ┄┄
│ ├─ time-1.6.0.1 ┄┄
│ ├─ unordered-containers-0.2.8.0 ┄┄
│ └─ vector-0.12.0.1 ┄┄
Which looks fine, except for mostly
vectorunordered-containershalf
which are dependencies which seem to have been included merely for providing Serialise class instances for their types.
Compare this to binary, whose dep-tree looks like
binary-0.8.5.1
├─ array-0.5.1.1
│ └─ base-4.9.1.0
│ ├─ ghc-prim-0.5.0.0
│ │ └─ rts-1.0
│ ├─ integer-gmp-1.0.0.1
│ │ └─ ghc-prim-0.5.0.0 ┄┄
│ └─ rts-1.0 ┄┄
├─ base-4.9.1.0 ┄┄
├─ bytestring-0.10.8.1
│ ├─ base-4.9.1.0 ┄┄
│ ├─ deepseq-1.4.2.0
│ │ ├─ array-0.5.1.1 ┄┄
│ │ └─ base-4.9.1.0 ┄┄
│ ├─ ghc-prim-0.5.0.0 ┄┄
│ └─ integer-gmp-1.0.0.1 ┄┄
└─ containers-0.5.7.1
├─ array-0.5.1.1 ┄┄
├─ base-4.9.1.0 ┄┄
├─ deepseq-1.4.2.0 ┄┄
└─ ghc-prim-0.5.0.0 ┄┄
There's two basic ends in the spectrum of organising classes vs type providing packages avoiding orphans:
- Have instances live in the
class-providing packages by having them depend on all interesting type-providing packages - Have
class-providing packages be rather lean in order to have them be dependees of type-providing packages.
I argue that 2. is the more desirable approach, as the typeclass usually evolves less slowly so with approach 1. keeping track of all type-providing packages would result in having to update the class-providing package more frequently to accommodate new major versions (and possibly support multiple maj versions at once!); moreover this can result in more version-conflict pressure. Finally, as 1. leads to a package with a heavier dependency footprint, this would reduce the incentive to have packages use it to provide instances, again reinforcing the need for the class-providing package to pickup even more dependencies to provide instances.
Of course, this all assumes wanting to avoid orphan instances at all costs.
Assuming we agree that 1. is the more desirable ideal, the current dependency footprint of binary-serialise-cborg reduce the incentive to have packages which would have been fine with adding binary for providing Binary instances to also add binary-serliase-cbor as a dependency. Take uuid-types for example:
uuid-types-1.0.3
├─ base-4.9.1.0
│ ├─ ghc-prim-0.5.0.0
│ │ └─ rts-1.0
│ ├─ integer-gmp-1.0.0.1
│ │ └─ ghc-prim-0.5.0.0 ┄┄
│ └─ rts-1.0 ┄┄
├─ binary-0.8.3.0
│ ├─ array-0.5.1.1
│ │ └─ base-4.9.1.0 ┄┄
│ ├─ base-4.9.1.0 ┄┄
│ ├─ bytestring-0.10.8.1
│ │ ├─ base-4.9.1.0 ┄┄
│ │ ├─ deepseq-1.4.2.0
│ │ │ ├─ array-0.5.1.1 ┄┄
│ │ │ └─ base-4.9.1.0 ┄┄
│ │ ├─ ghc-prim-0.5.0.0 ┄┄
│ │ └─ integer-gmp-1.0.0.1 ┄┄
│ └─ containers-0.5.7.1
│ ├─ array-0.5.1.1 ┄┄
│ ├─ base-4.9.1.0 ┄┄
│ ├─ deepseq-1.4.2.0 ┄┄
│ └─ ghc-prim-0.5.0.0 ┄┄
├─ bytestring-0.10.8.1 ┄┄
├─ deepseq-1.4.2.0 ┄┄
├─ hashable-1.2.6.1
│ ├─ base-4.9.1.0 ┄┄
│ ├─ bytestring-0.10.8.1 ┄┄
│ ├─ deepseq-1.4.2.0 ┄┄
│ ├─ ghc-prim-0.5.0.0 ┄┄
│ ├─ integer-gmp-1.0.0.1 ┄┄
│ └─ text-1.2.2.2
│ ├─ array-0.5.1.1 ┄┄
│ ├─ base-4.9.1.0 ┄┄
│ ├─ binary-0.8.3.0 ┄┄
│ ├─ bytestring-0.10.8.1 ┄┄
│ ├─ deepseq-1.4.2.0 ┄┄
│ ├─ ghc-prim-0.5.0.0 ┄┄
│ └─ integer-gmp-1.0.0.1 ┄┄
├─ random-1.1
│ ├─ base-4.9.1.0 ┄┄
│ └─ time-1.6.0.1
│ ├─ base-4.9.1.0 ┄┄
│ └─ deepseq-1.4.2.0 ┄┄
└─ text-1.2.2.2 ┄┄
...which depends on binary but depending on binary-serialise-cbor would be problematic as it would transitively add somewhat expensive dependencies (vector & unordered-containers specifically).
To come to a conclusion, here's a compromise:
Move all instances for "standard" types provided by packages (unless binary-serialising-cbor needs them for its implementation) and that aren't considered GHC-boot-libs (NB: I'd consider text to be boot-lib-ish at this point, given that text is about to become a dep of Cabal) to a designated orphan-instance package, e.g. either
binary-serialising-cbor-orphansorbinary-serialising-cbor-instances
As to where to draw the line whether a package is standard, I'd suggest the policy used by e.g. http://hackage.haskell.org/package/quickcheck-instances, that is to "supply ~~QuickCheck~~Serialise instances for types provided by the Haskell Platform."
Over time when binary-serialise-cbor becomes more popular, those instances may hopefully be "adopted" into their respective type-providing packages, thereby becoming non-orphans.
Personally I would have said that those three specific packages are the minimum set of dependencies necessary to support the full range of types supported by CBOR. I would be hesitant to include anything else, but those three seem like they round out the space of encodings nicely (you could argue unordered-containers isn't necessary as we have Data.Map I guess, but it's so widespread I feel it is ok to keep, until it imports this library alongside/instead of binary).
For the more general problem of instances, I don't like either solution but don't have another one.
My main concern is about vector & unordered-containers (half seems like a neglectable light dependency), as it would be difficult to justify uuid-type picking up those extra dependencies just so it can provide non-orphan CBOR instances.
As you mention, for representing "major type 5" already containers suffices (or even just plain-old list-of-pairs [(key,value)] - but I'm not going to advocate that as we do have already containers available).
Similarly, for "major type 4" we have cheaper 2 alternatives to vector as well: H98's array package as well as plain-old-list [a].
So perhaps separate cborg-vector and cborg-containers packages (I guess this is just the cborg-instances package, but I'm not sure whether it's best to have package with heaps of deps, that nearly everyone will need, or more granular ones so people have more control of their deps)? We need to have the efficient Vector encoding for primitive types living somewhere, but I agree that having to have vector as a dep of cborg is less than ideal.
Things should be much better in this regard now since we have split up cborg and serialise, assuming you don't want to use the Serialise class.
@bgamari I'm not sure anything has changed (with the split in its current form) though as regards to the problem of how to provide the canonical CBOR instances for the UUID type (as the way I understand it, the CBOR specification has defined a specific tagged encoding for UUID values). Unless I'm missing something, it would be of rather modest use for uuid-type to depend on cborg only as it only proivdes low-level primitives. IOW, the only thing I can do if uuid-type depends on cborg only is to define two additional primitives
decodeUUID :: Decoder s UUID
encodeUUID :: UUID -> Encoding
but this on its own would still leave users on their own when wanting to serialise their data-types and have e.g. their instances be derived via Generics as e.g. for
data Foo = Foo UUID !Text !Bool !Int !(Maybe UUID) {- ... -}
deriving Generic
instance Serialise Foo -- fails because there's no 'Serialise UUID' instance
So the problem for me still remains where e.g. the Serialise UUID instance shall live. And also a related problem, how something like Cabal or GHC would be able to use CBOR-serialisation if we wanted that (would we have to duplicate/reinvent the typeclasses from the serialise package sans the additional-dependencies-inducing instances into Cabal's code-base?)
In the case of uuid-types it seems there is already precedent for it to provide its instances as uuid-types depends upon binary. Given that uuid-types depends upon binary, we would prefer not to pick it up as a dependency.
For what is worth, I uploaded to Hackage a library called serialise-uuid that exports both the cborg encoder and decoder at Codec.CBOR.UUID, and an orphan Serialise instance from Codec.Serialise.UUID. It could be easily split into two libraries cborg-uuid and serialise-uuid if necessary. If some day cborg or serialise themselves were to export this functionality, serialise-uuid would be deprecated.