vector Need a Boxed instance for Unboxed (to be able to unbox user data types with some boxed fields)

It may sound counter-intuitive, but it would be really useful for unboxing structures that contain some boxed data.

For example, the docs for As give this little example record data type:

data Foo a = Foo Int a
  deriving Show

It defines a relation between this and a tuple (for which there is an existing Unbox instance)

instance VU.IsoUnbox (Foo a) (Int,a) where
  toURepr (Foo i a) = (i,a)
  fromURepr (i,a) = Foo i a

and end up defining an instance

instance VU.Unbox a => VU.Unbox (Foo a)

But this requires the a to be unboxable too. What if I have a record member that really does need to be boxed, such as a reference, another array, a string, etc etc etc.

I should instead be able to define the relation like this

instance VU.IsoUnbox (Foo a) (Int, Boxed a) where
  toURepr (Foo i a) = (i,a)
  fromURepr (i,a) = Foo i a

where Boxed is a provided newtype that is an instance of Unbox already, and thus the tuple (Int, Boxed a) is therefore an instance of Unbox.

Obviously, the representation for a vector of Boxed is just an ordinary boxed vector!

newtype Boxed a = Boxed a

newtype instance VU.MVector s (Boxed a) = MV_Boxed (V.MVector s a)
newtype instance VU.Vector    (Boxed a) = V_Boxed  (V.Vector    a)

instance VGM.MVector VUM.MVector (Boxed a) where
  basicLength (MV_Boxed v) = VGM.basicLength v
  -- ... etc, for all the methods
  
instance VG.Vector VU.Vector (Boxed a) where
  basicUnsafeFreeze (MV_Boxed v) = V_Boxed <$> VG.basicUnsafeFreeze v
  -- ... etc, for all the methods

instance VU.Unbox (Boxed a)

And that's it. Now we can unbox any record, even if some fields are still boxes.

The above code type checks, it's just missing the impl for all the methods, but that's just doing the obvious boring thing. A PR would be easy.

Sep 20 '24 10:09 dcoutts

Related: it'd also be nice to have specific Unbox instances for types like ByteArray and MutableByteArray that are (morally) unlifed but boxed. This could take advantage of recent GHC's ability to have levity-polymorphic arrays (which ideally should be exposed via Primitive.Array).

And further refinements: we could also have BoxedStrict for cases where we want a vector of boxed data, but want the values in the array to be always evaluated to WHNF. Currently it's actually quite hard to reliably have vectors of boxed but WHNF data.

Sep 20 '24 10:09 dcoutts

Yes. I would say that Unbox is misnomer. It about selecting representation of an array by element type. And it need not to be necessarily unboxed.

There's also precedent UnboxViaPrim uses primitive vector as a representation. It could be used as a template. I would gladly accept PR with such addition. Only question is naming. I think it would be nice to be explicit about strictness of underlying array.

I can see two possible names: AsLazyBoxed/AsStrictBoxed using As as reference. Or UnboxVia{Strict,Lazy}Boxed which is based on UnboxViaPrim but it's long and weird. So former it probably better

Sep 20 '24 11:09 Shimuuar

It may sound counter-intuitive, but it would be really useful for unboxing structures that contain some boxed data.

I had similar thoughts a while ago, but never really had a use case for it. So, I do agree that it would be a good idea to add a type like that. However, with respect to Unbox instance for the Boxed type, I think it would be useful to enforce NFData on the a in Boxed a, because one thing Unboxed vectors gives us is the guarantee that all its elements are in NF. In other words, I do agree with @Shimuuar that "it[Unbox] need not to be necessarily unboxed.", but Normal form is a very nice property of unboxed vectors. Others might have different opinion on the subject.

we could also have BoxedStrict for cases where we want a vector of boxed data, but want the values in the array to be always evaluated to WHNF. Currently it's actually quite hard to reliably have vectors of boxed but WHNF data.

Fairly recently new interface was added for strict boxed vectors where elements are always in WHNF, it just hasn't been released yet: https://github.com/haskell/vector/pull/488 So, I am not really sure of usefullness of a specialized BoxedStrict newtype for elements.

Sep 20 '24 18:09 lehins

So question is really about semantics. What are unboxed vector? What do we expect from implementations? Documentation say little:

The implementation is based on type families and picks an efficient, specialised representation for every element type.

One possibility is to think that Unbox only provide some "efficient" representation of vector without making any particular promises.

Another one which I haven't considered is to expect that elements are reduced to NF. By enforcing NFData you mean rnfing elements on write, correct?

Sep 21 '24 17:09 Shimuuar

By analogy, I believe there should also be an AsStorable wrapper (or UnboxViaStorable, depending on the naming scheme). In my use case, I get two parallel arrays from C API, and I think the most efficient high-level abstraction is to wrap both into Storable vectors (via ForeignPtrs), and then combine them just like unboxed vectors for pairs. Elements in Storable vectors are effectively forced to NF, so strictness should not be a problem.

Oct 02 '24 23:10 ruifengx

I'm willing to take a stab at this. If unsuccessful, I'll at least report the problematic pain points for a future contributor.

Oct 10 '24 11:10 recursion-ninja

Another one which I haven't considered is to expect that elements are reduced to NF. By enforcing NFData you mean rnfing elements on write, correct?

The vector package already depends on deepseq so that seems like a palatable constraint. Perhaps that means for strict, compact layouts of unboxed values in an Unboxed vector, they require the following constraint:

-- General constraint
NFData a => Unbox (AsBoxedStrictly a)


-- Example case with this constraint
instance NFData a => VU.IsoUnbox (Foo a) (Int, AsBoxedStrictly a) where
...

Oct 10 '24 16:10 recursion-ninja

So we need to make a decision what sort of instances do we allow. I'm more of anything goes persuasion and @lehins thinks that unboxed vectors should reduce stored element to normal form. It's nice property and all existing instances has it.

If this discussion haven't started I'd just used boxed vector. But reduction ot NF is nice and in line with naming. So I waekly in favor of requiring it. @lehins , @bodigrim what's your opinion?

Oct 10 '24 16:10 Shimuuar

Lets go back to the use case, it's to be able to have vectors of types like this:

data Foo a = Foo !Int a

or

data Bar a = Foo !Int !a

So we may want strict or lazy fields, but either way the semantics of a constructor are only to force arguments to WHNF. There's no notion of normal form fields (nor should there be).

So that would argue for having BoxedStrict or BoxedLazy to be explicit about the above choice. And the strict one should be WHNF only, not NF. It would not be useful for the intended use case if we required NF. It would also be very expensive to deepseq on each array write.

Oct 16 '24 09:10 dcoutts

@lehins says:

However, with respect to Unbox instance for the Boxed type, I think it would be useful to enforce NFData on the a in Boxed a, because one thing Unboxed vectors gives us is the guarantee that all its elements are in NF.

but it's also true that unboxed vectors give us the guarantee that all elements are in WHNF, simply because for all the existing unbox instances WHNF == NF. So it's perfectly reasonable for a generalisation to be to WHNF.

And as I note above, the main use case is to unbox constructors with a mixture of field types, including unlifted.

Let me give a concrete use case. I want an unboxed vector of this type:

data IOOp m = IOOpRead  !Fd !FileOffset !(MutableByteArray (PrimState m)) !Int !ByteCount
            | IOOpWrite !Fd !FileOffset !(MutableByteArray (PrimState m)) !Int !ByteCount

This obviously wants to be represented by 5 vectors of bool/int/word etc, and one vector of boxed things. (Of course, even better in this use case is boxed but unlifted, but that's a separate feature).

Oct 16 '24 09:10 dcoutts

@Shimuuar writes:

I can see two possible names: AsLazyBoxed/AsStrictBoxed using As as reference. Or UnboxVia{Strict,Lazy}Boxed which is based on UnboxViaPrim but it's long and weird. So former it probably better

I like it. In fact I like the latter ones, UnboxVia{Strict,Lazy}Boxed. I think long names here are fine because they're not used often, just in deriving via declarations. And having "UnboxVia..." as a name prefix makes their use case nice and clear, and makes them relatively discoverable. The "As" type while cute and short is quite hard to discover, precisely because of its short generic name. I only stumbled across it because it was next to UnboxViaPrim in the haddock docs.

My personal favourite:

UnboxViaBoxed{Lifted,Unlifted}

This follows the naming convention of GHC's RuntimeRep with BoxedRep Levity and levity being Lifted or Unlifted.

Oct 16 '24 09:10 dcoutts

because for all the existing unbox instances WHNF == NF

But that's only true for primitive types, not for tuples and records with lazy fields which use standard instances.

Oct 16 '24 10:10 Shimuuar

What records with lazy fields? Yes, the tuples are being used as an intermediate type for the IsoUnbox for user records, so that doesn't reflect any "original" lazy fields from the user type. But even there, there's no NF, it's just Unbox again.

Oct 16 '24 13:10 dcoutts

What I meant storing say tuple (Double,Int) will evaluate it to NF not just to WHNF. So would work data Foo = Foo Double Int if uses same representation as tuples.

Oct 16 '24 22:10 Shimuuar

@dcoutts, @Shimuuar; to to pile on to the bikes-hedding too much, but might I suggest UnboxViaBoxed{Strictly,Lazily} or UnboxViaBoxing{Strictly,Lazily} instead of UnboxVia{Strict,Lazy}Boxed. I think that they read equally clearly as English, but the "stable prefix" of UnboxViaBox{ed/ing} and the "variable suffix" of {Strictly,Lazily} makes detecting semantic differences easier when scanning through similar code segments. Your thoughts?

Oct 17 '24 09:10 recursion-ninja

Yes, I think the prefix and suffix are good, whether it's strict/lazy or lifted/unlifted. I don't mind about "ing" "ed" etc.

Oct 17 '24 10:10 dcoutts

My personal opinion is that Unbox instances can already be implemented in any way a user wishes them to be implemented, because there are currently no formal laws that describe what Unbox really is. So, if people are hesitant about my suggestions of keeping elements in NF or even WHNF, then so be it. This ticket is concerned about providing mechanism for easy deriving of Unbox instances for product types that contain other types that can't be unboxed. I totally understand that this can be very useful, but we need to clearly indicate what what these deriving mechanism are doing and document that we leave the decision up to the user whether the whole of the type will be unboxed or not.

In my opinion UnboxViaBox* sounds like a tautology and I am really against this naming. We should reflect in the naming correctly what is happening: namely there will be no unboxing potentially with some strictness implications. Therefore I suggest adding these three newtype wrappers that users can choose from when they are deriving Unbox:

newtype DoNotUnbox a - use a lazy boxed vector for underlying implementation
newtype DoNotUnboxWHNF a - use a strict boxed vector for underlying implementation
newtype DoNotUnboxNF a - use a strict boxed vector that forces elements to NF with deepseq on writes for underlying implementation.

With all that said, I leave the final decision to @Shimuuar on how it should be implemented and or named, since he was the person who came up with this newtype deriving for Unbox in the first place.

@dcoutts With regards to lifted vs unlifted, I believe that should be a separate ticket and a separate discussion. Because, in my opinion vector should already provide Unbox instances out of the box for all types that can have an unlifted instance: Array, ByteArray, MutableByteArray, IORef`, etc. There are a few issues we'd have to discuss: eg. how do we deal with GHC that do not support levity polymorphism, how do we treat the state token for mutable types, etc.

Oct 17 '24 19:10 lehins

Let then go with "allow user to decide" approach and provide all three variants of wrappers. And I agree we do need beter documentation explaining what unboxed vectors are and what exactly wrappers do.

Only question left is naming. It looks like no one object to UnboxVia+<suffix> theme. And DoNotUnbox is much better than UnboxViaBox (very weird!).

newtype DoNotUnboxLazy a
newtype DoNotUnboxStrict a
newtype DoNotUnboxNF a

I think picking lazy vector would be less common choice than strict one so it's better to be explicit about it and there's no counterpart to WHNF.

Regarding naming pair lazy/strict is better lifted/unlifted. It's widely used in ecosystem: containers, transformers, this package.

Oct 19 '24 19:10 Shimuuar

@Shimuuar I like the names you suggested:

newtype DoNotUnboxLazy a
newtype DoNotUnboxStrict a
newtype DoNotUnboxNF a

I'm updated the names in the PR tor reflect these, with the minor change of DoNotUnboxNF to DoNotUnboxNormalForm.

Oct 21 '24 15:10 recursion-ninja

Implemented in #508

Nov 01 '24 22:11 Shimuuar