primitive icon indicating copy to clipboard operation
primitive copied to clipboard

write up policies for 'Prim' typeclass

Open chessai opened this issue 7 years ago • 3 comments

I think we should include a write-up about what makes sense in a Prim instance. Some specific points I've thought of are what makes a valid Prim instance (laws), what makes sense to have a Prim instance, why certain things are excluded, etc.

For example, I just discussed with @andrewthad about why Bool doesn't have a Prim instance. Aside from space inefficiency, let's say we map True to 1 and False to 0, then writing a byte to a ByteArray# means that only 1 and 0 are valid, and we cannot interpret any byte other than 0 or 1 as a Bool. This makes sense for anything which cannot be talked about in bytes (i.e. its size is not at least 1 byte).

chessai avatar Jul 03 '18 14:07 chessai

  1. Prim's intent / historical context (and Primitive as a whole): to be a type * wrapper around reading writing to memory locations holding the various builtin / primitive ghc types, to hide / manage that stuff for Vector.

This means that anything that doesn't just cleanly map to portable ghc machine semantics is out of scope

some consequences: because GHC's memory allocator only does multiples of a native machine word/pointer, and unlifted unboxed types correspond (always! at least historically ignoring array stuff) to a value that will possibly be passed by register (aka sort of by value), and who's in memory representation on the heap will generally be at least the size of a native word.

This is actually a very good thing by default! Most / all computing hardware / CPU designs do not provide sub-word sized atomic operations, so mutable memory atomics generally work on word (32/64bit ) sized units of memory. (ok, technically the real granularity of CPU atomics is an entire cache line ... but not the point)

the real answer is this: there really really shouldn't be user defined Prim instances, we should move the Array kit into vector, and iterate on the different memory layout semantics choices land scape there, and also make the pure array api's take advantage of applicable fusion framework engineering

cartazio avatar Jul 03 '18 17:07 cartazio

I think I disagree with the intent to be only the workhorse of vector. I think @treeowl put this well in his response to SPJ in the libraries mailing list, in the following thread: https://mail.haskell.org/pipermail/libraries/2018-April/028732.html

I think the areas in the primitive library demonstrate a part of the array design space that seems to have gone relatively unexplored in Haskell: plain old vectors. Unlike the vector library, there is no stream fusion framework. Unlike the array library, there is no class-based reliance on fold/build fusion.

The (flawed) addition of numerous class instances a couple versions ago made primitive arrays seem much more viable to end users as an alternative to their heavier counterparts. I have become interested in fixing the mistakes that were made, seeing just how much performance we can squeeze out (and what limitations we run into), and fleshing out the API.

In short, primitive can be much more than just vector's lackey, and APIs centred around fusion are not always preferable. Consider the following WIP library(ies) by @andrewthad - https://github.com/andrewthad/packed - It contains implementations of Text, ByteString, aeson's Parser, and some other things using primitive - and benchmarks see 2x to 10x speedups in a significant number of operations. This approach is not totally without its downsides, but it is a good case study.

In response to not wanting users to define Prim instances: I don't know what would warrant making Prim effectively a closed typeclass. As long as the type over which an end user wishes to define a Prim instance is a product of ghc primitives, this can be beneficial to users wishing to utilise 'plain-old arrays' to reap memory-related performance benefits.

chessai avatar Jul 06 '18 20:07 chessai

This means that anything that doesn't just cleanly map to portable ghc machine semantics is out of scope.

In what way does a Prim instance for Complex or a prim instance for Identity a not map to portable ghc machine semantics?

I do happen to have several Prim instances in libraries I maintain. In the ip package, I have Prim instances for Mac, IPv4, and IPv6 because in applications I work on, I need to store values of these types in a PrimArray.

we should move the Array kit into vector, and iterate on the different memory layout semantics choices landscape there

What do you mean by this? Do you mean providing functions and typeclasses for working on unsliced arrays in the vector library? This could be interesting, and I think it may meet what I'm looking for, but I'd like to make sure I'm interpreting what you wrote correctly. My understanding is that we would need something like the typeclass from Data.Vector.Generic but without the slicing methods.

benchmarks see 2x to 10x speedups

This is misleading. These speedups are the result of other optimizations, not of using primitive instead of vector.

To sum it up, I'm not really sold on the idea that there shouldn't be user-defined Prim instances, but I'm interested in the idea of adding support for unsliced stuff to vector.

andrewthad avatar Jul 06 '18 21:07 andrewthad