TransmogrifAI icon indicating copy to clipboard operation
TransmogrifAI copied to clipboard

[WIP] Scala 2.12 / Spark 3 upgrade

Open nicodv opened this issue 3 years ago • 14 comments

Related issues https://github.com/salesforce/TransmogrifAI/issues/336 https://github.com/salesforce/TransmogrifAI/issues/332

Describe the proposed solution Upgrade to Scala 2.12 and Spark 3

Describe alternatives you've considered Living in the past, suffering from security issues and missing out on feature and speed improvements

Additional context Add any other context about the changes here.

nicodv avatar Mar 18 '21 17:03 nicodv

Thanks for the contribution! Before we can merge this, we need @wsuchy @koertkuipers to sign the Salesforce.com Contributor License Agreement.

salesforce-cla[bot] avatar Mar 18 '21 17:03 salesforce-cla[bot]

@leahmcguire I think it is just not being used - https://github.com/salesforce/TransmogrifAI/pull/550#discussion_r597091426

tovbinm avatar Apr 23 '21 18:04 tovbinm

We should be careful in how we define unused in a public project. Also that functionality would be needed to migrate projects on Transmogrifai V0...

leahmcguire avatar Apr 23 '21 19:04 leahmcguire

Hey @tovbinm,there's a unit test failure I've been investigating that's the result of a bug in Spark: https://issues.apache.org/jira/browse/SPARK-34805?focusedCommentId=17337491&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17337491.

I'm wondering why testData in SanityCheckerTest.scala (L137) is constructed the way it is, with the metadata for the features column added manually. The fact that the metadata isn't passed along in DataFrame.select anymore is discovered by this assertion.

I'm assuming Spark won't fix this any time soon, and I'm having trouble finding an alternative way of putting in the metadata in the schema of testData. I've tried .withColumn, but it still relies on .select under the hood. What's your take on this?

emitc2h avatar Apr 30 '21 19:04 emitc2h

Also pinging @Jauntbox (we know you're out there!) for question above.

nicodv avatar Apr 30 '21 20:04 nicodv

This is a known issue indeed. We have been copying over the metadata between fields each time we apply our transformers, e.g OpTransformer1.transform

tovbinm avatar Apr 30 '21 20:04 tovbinm

This is a known issue indeed. We have been copying over the metadata between fields each time we apply our transformers, e.g OpTransformer1.transform

I mean that there is a new problem with Spark 3.1. Even OpTransformer1.transform is broken now since it relies on .select to pass back the metadata into the output dataframe. SelectedModelCombinerTest tests the .transform function directly and also fails for the same reason.

emitc2h avatar Apr 30 '21 20:04 emitc2h

StructField still has the metadata in it, it's just ExpressionEncoder in Spark 3.x does not allow passing it anymore. Oh, it's a true bummer. We rely heavily on this feature.

tovbinm avatar Apr 30 '21 21:04 tovbinm

Hello, any estimation on when we can get this PR ready? Thank you!

hedibejaoui avatar Sep 16 '21 17:09 hedibejaoui

@hedibejaoui , we are running internal forks of TransmogrifAI and MLeap on Spark 3.1.1, so the bulk of the work has been done.

For public release, the MLeap dependency needs to be upgraded now that they're on Spark 3 too: https://github.com/combust/mleap/pull/765

But since they've upgraded to Spark 3.0.2 and TransmogrifAI to 3.1.1, we have some testing left to do.

nicodv avatar Sep 16 '21 20:09 nicodv

@nicodv Thanks for the information. Actually, we are using Spark 3.0.x because of some internal dependencies, any chance we get a public release of TransmogrifAI for that version?

hedibejaoui avatar Sep 17 '21 09:09 hedibejaoui

Hello, When do you think this PR will merged for the public use? Thank you!

Fatma-abdel avatar Oct 19 '21 20:10 Fatma-abdel

Hi, This PR adds important functionality that I need for my project. When will this PR merge ?

Thank you

EhsanSadr avatar Oct 26 '21 15:10 EhsanSadr

Hi, we are waiting for the new PR adds. When it will be available ? Thanks

MeriamAffes avatar Oct 26 '21 16:10 MeriamAffes