saddle icon indicating copy to clipboard operation
saddle copied to clipboard

transpose on Panel fails

Open tnielens opened this issue 3 years ago • 4 comments

import org.saddle._
Panel(Vec(1, 2, 3), Vec("hello", "world", "!")).T

throws

java.lang.ArrayStoreException
	at java.lang.System.arraycopy(Native Method)
	at org.saddle.array.package$.$anonfun$flatten$2(package.scala:623)
	at org.saddle.array.package$.$anonfun$flatten$2$adapted(package.scala:621)
	at scala.collection.immutable.Vector.foreach(Vector.scala:1856)
	at org.saddle.array.package$.flatten(package.scala:621)
	at org.saddle.scalar.ScalarTagAny.concat(ScalarTagAny.scala:64)
	at org.saddle.Frame.toMat(Frame.scala:1426)
	at org.saddle.Frame.T(Frame.scala:168)
	at repl.MdocSession$App.<init>(scalar.worksheet.sc:11)
	at repl.MdocSession$.app(scalar.worksheet.sc:3)

tnielens avatar Mar 20 '22 09:03 tnielens

Thanks for finding this.

What do you think of Panel? In the last 10 years I never used it.

pityka avatar Mar 20 '22 13:03 pityka

I don't have much experience with saddle. This works fine with the regular frame constructor:

Frame(Vec[Any](1, 2, 3), Vec[Any]("hello", "world", "!")).T

Issue is probably that the underlying arrays of Vec[Int] and Vec[String] aren't compatible for the transposition. Panel allows that by taking Vec[_]in whereas the Frame example above forces Vec[Any] and corresponding scalartag.

tnielens avatar Mar 20 '22 15:03 tnielens

Thanks for finding this.

What do you think of Panel? In the last 10 years I never used it.

Just chiming in to say heterogeneous Frames are important for statistical modeling where you have categorical variables and continuous variables in the same dataset. Does this forked version of saddle support heterogeneous Frames?

bbuchsbaum avatar May 23 '23 21:05 bbuchsbaum

No, in this fork Frame is Frame[RowIndextype, ColIndexType, ValueType] thus the values in the frame must be of a single type.

For the categorical variables you can overcome this by one-hot encoding, as eventually often times that is done downstream anyway for analysis. You can also encode the levels with doubles (just don't use them as you would use a numeric value).

I think there is a lot of use case in a heterogeneous column wise data structure, but as it is now, Frame is not that. To remove this confusion, make it simpler to maintain etc I removed those Panel constructors. I think there is place for a new data structure which zips together heterogeneous columns.

pityka avatar May 24 '23 08:05 pityka