frameless
frameless copied to clipboard
The right way to convert a column ?
Hello, I am starting with Frameless and I am having a hard time converting my code based on spark-Dataframes to the Frameless framework. The blocking point I reach now is how to override a column.
Let's say I have a dataframe with col1,col2,..., myColumn
.
myColumn
is a String
and it was an export from a database where this column is actually a Seq[String]
so I now need to convert it back to it's type.
I used to do
df
.withColumn("myColumn", toArray($"myColumn")
How would you do the same thing with Frameless ? Do you need two case class ? And use withColumnTuple
and dropTuple
?
Hi @leobenkel sorry I've missed this question! yes, withColumnTuple
is the way to add and drop columns with Frameless. If you go with withColumn
you will need to define a new case class.
But when using withColumnTuple
I am loosing the name of all my columns. If I had index, feature, label
and I use withColumnTuple
I expect to see index, feature, label, _1
but instead i see _1,_2,_3,_4
@leobenkel for some reason I missed this comment, apologies! So in the case where you need to keep the types it's better to use an projection
. For example, (x:X).project[B]
. The catch here is that you will need to define a new type B
. The docs have more examples on this.