transpose on Panel fails
import org.saddle._
Panel(Vec(1, 2, 3), Vec("hello", "world", "!")).T
throws
java.lang.ArrayStoreException
at java.lang.System.arraycopy(Native Method)
at org.saddle.array.package$.$anonfun$flatten$2(package.scala:623)
at org.saddle.array.package$.$anonfun$flatten$2$adapted(package.scala:621)
at scala.collection.immutable.Vector.foreach(Vector.scala:1856)
at org.saddle.array.package$.flatten(package.scala:621)
at org.saddle.scalar.ScalarTagAny.concat(ScalarTagAny.scala:64)
at org.saddle.Frame.toMat(Frame.scala:1426)
at org.saddle.Frame.T(Frame.scala:168)
at repl.MdocSession$App.<init>(scalar.worksheet.sc:11)
at repl.MdocSession$.app(scalar.worksheet.sc:3)
Thanks for finding this.
What do you think of Panel? In the last 10 years I never used it.
I don't have much experience with saddle. This works fine with the regular frame constructor:
Frame(Vec[Any](1, 2, 3), Vec[Any]("hello", "world", "!")).T
Issue is probably that the underlying arrays of Vec[Int] and Vec[String] aren't compatible for the transposition. Panel allows that by taking Vec[_]in whereas the Frame example above forces Vec[Any] and corresponding scalartag.
Thanks for finding this.
What do you think of Panel? In the last 10 years I never used it.
Just chiming in to say heterogeneous Frames are important for statistical modeling where you have categorical variables and continuous variables in the same dataset. Does this forked version of saddle support heterogeneous Frames?
No, in this fork Frame is Frame[RowIndextype, ColIndexType, ValueType] thus the values in the frame must be of a single type.
For the categorical variables you can overcome this by one-hot encoding, as eventually often times that is done downstream anyway for analysis. You can also encode the levels with doubles (just don't use them as you would use a numeric value).
I think there is a lot of use case in a heterogeneous column wise data structure, but as it is now, Frame is not that. To remove this confusion, make it simpler to maintain etc I removed those Panel constructors. I think there is place for a new data structure which zips together heterogeneous columns.