algebra icon indicating copy to clipboard operation
algebra copied to clipboard

instances for vector

Open non opened this issue 9 years ago • 2 comments

Inspired by @ceedubs' PR (thanks btw) I was inspired to start thinking about vectors. There are different interpretations of vectors of different lengths. For example, are vector implicitly padded by a zero value (i.e. sparse) or not? Is the order of the values significant or not (i.e. should we prefer lexicographic orders)? Should we support pair-wise operations beyond the usual things like vector addition?

Here's what I have so far:

Eq[A] => Eq[Vector[A]]

  • Exact: every element must match
  • Sparse: trailing zeros are implied (required Monoid[A])

PartialOrder[A] => PartialOrder[Vector[A]]

  • Lexicographic: first non-equal field determines result
  • Conservative: all positions must have the "same" relationship or NaN is returned
  • (Exact vs Sparse distinction applies to both variants)
  • (Another possible impl could use distance metric)

Order[A] => Order[Vector[A]]

  • Lexicographic: first non-equal field determines result
  • (Exact vs Sparse distinction applies)
  • (Another possible impl could use distance metric)

Semigroup[A] => Semigroup[Vector[A]]

  • Strict: lengths must match or error
  • Hand-wavy: longer vector fills in tail automatically

Monoid[A] => Monoid[Vector[A]]

  • Sparse: empty vector is identity element (multiple equiv reprs -- needs consistent eq/order)
  • (Exact impl would require a "size" param, couldn't be implicit)

Group[A] => Group[Vector[A]]

  • (Same basic situation as for monoid)

Anyway, I am inclined to try to simplify the picture by making sparse/lexicographic the defaults. The nice thing there is that we can provide efficient, reasonable definitions implicitly without hassling the user. The downside is that it could mask potential size errors. I've had situations where a pairwise Field[Vector[A]] would have been useful, but I think it would be more idiomatic to use a case class or tuple for that situation in Scala, so I'm happy to leave that alone for now.

I will look at how Algebird currently does this. @johnynek, what are your thoughts here?

non avatar Jun 07 '15 22:06 non

We did the sparse approach, and we don't have Order, so we didn't face the ordering issue.

For the sparse definition, we never really got into any problems, since most algorithms do use vectors of the same length. It is a bit weird that the zero becomes the zero length vector, and you get into the issue of wondering if you should trim trailing zeros.

Mathematically, one almost never uses a lexigraphical sorting, but sometimes we want just some kind of order, so I guess it's okay. The distance-based one is interesting, as that could actually be useful: given a point and a Metric, construct an ordering (first on distance to the point, then lexigraphic to break ties, or something). This could be useful in picking the min of a bunch of points in some kind of search algorithm.

Also, we did add Metric and VectorSpace to algebird: https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/Metric.scala https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/VectorSpace.scala#L47

but we probably only used them a few times.

johnynek avatar Jun 08 '15 19:06 johnynek

Sorry for the long delay in my response. I've had a lot going on.

When I created the PR for Vector instances, it was actually because I wanted it for reasonably-performant appends for the "log" part of a WriterT that I could use with Cats. Currently there is no DList or double-ended queue built for Cats. At the time I didn't think about the fact that people who actually do real math might want a very different monoid for Vectors, but it makes sense.

I'm happy to either change #50 or close it if someone else would rather have control of it. I'll await advice from someone else on how to go forward.

ceedubs avatar Jun 20 '15 13:06 ceedubs