emma
emma copied to clipboard
Bag <> Matrix Transformations
This issue should be used to discuss the transformation methods for bag <> matrix and bag <> vector.
The initial prototype and ongoing effort is tracked in PR #194.
I focus on matrix <> bag transformations now, but vector <> bag will follow.
In the description below, A is type bound to spire.Numeric.
Bag to Matrix
-
DataBag[Product]Transforms a bag of Products to a matrix where the number of rows is defined by the number of products in the bag and the number of columns is defined by the arity of product type. The elements in the product have to be of the same type and are bound byspire.Numeric(we have to cast explicitly here). -
DataBag[Vector[A]]Analog to DataBag[Product], except that we can enforce the numeric type in the signature. -
DataBag[(Int, Int, A)]with explict # rows and cols Transforms a bag of(rowIndex, colIndex, value)triplets into a (possibly) sparse matrix. The dimensions of the matrix is explicitly defined by the user. -
DataBag[(Int, Int, A)]with implicit # rows and cols Analog to the previous transformation, except that the number of rows and cols is defined by the largest row and column indexes among the triplets in the bag. -
DataBag[Product] / DataBag[Vector[A]]with key and value extractor methods [not there] I would like to have this transformations, where the user can define two functions that extract for each product/vector in the bag the key and the value. As it is easily possible for the vector (as we have the numeric type), it gets very ugly in the case of product, as the user has to work with products and also is responsible for casting the elements in the functions.
Matrix to Bag
- matrix to
DataBag[Vector[A]]Transforms a matrix to a bag of vectors. - matrix to
DataBag[(Int, Vector[A])]Transforms a matrix to a bag of(rowIndex, Vector[A]). - matrix to
DataBag[(B, Vector[A])][not there] This seems interesting for me in case someone wants to have a labeled vector, whereBcan be an arbitrary type defined by, e.g. a functionlabel[B]: (rowIdx: Int, v: Vector[A]) => B
Open Questions
- Naming of the methods (type erasure omits using the same method names sadly)
- Is there need for additional transformation methods?
- I would suggest to move the methods in matrix and bag once we agreed on the API instead of keeping them in a separated
Transformationsobject.
Vector to Bag / Bag to Vector
Once I have an initial version of these transformations I'll add them here.
Merged initial transformations in #194