frameless icon indicating copy to clipboard operation
frameless copied to clipboard

The right way to convert a column ?

Open leobenkel opened this issue 5 years ago • 3 comments

Hello, I am starting with Frameless and I am having a hard time converting my code based on spark-Dataframes to the Frameless framework. The blocking point I reach now is how to override a column.

Let's say I have a dataframe with col1,col2,..., myColumn . myColumn is a String and it was an export from a database where this column is actually a Seq[String] so I now need to convert it back to it's type. I used to do

df
  .withColumn("myColumn", toArray($"myColumn")

How would you do the same thing with Frameless ? Do you need two case class ? And use withColumnTuple and dropTuple ?

leobenkel avatar Sep 04 '18 16:09 leobenkel

Hi @leobenkel sorry I've missed this question! yes, withColumnTuple is the way to add and drop columns with Frameless. If you go with withColumn you will need to define a new case class.

imarios avatar Sep 11 '18 21:09 imarios

But when using withColumnTuple I am loosing the name of all my columns. If I had index, feature, label and I use withColumnTuple I expect to see index, feature, label, _1 but instead i see _1,_2,_3,_4

leobenkel avatar Sep 11 '18 21:09 leobenkel

@leobenkel for some reason I missed this comment, apologies! So in the case where you need to keep the types it's better to use an projection. For example, (x:X).project[B]. The catch here is that you will need to define a new type B. The docs have more examples on this.

imarios avatar Nov 15 '18 03:11 imarios