SynapseML icon indicating copy to clipboard operation
SynapseML copied to clipboard

Spark 3.0 build

Open w1nk opened this issue 4 years ago • 10 comments

Hello!

We're currently using mmlspark in our spark 2.4 clusters to awesome effect (training 3.4 billion rows, ~600gb of data). Thanks for all the work!

There is a desire within our organization to migrate these clusters to spark 3.0. We attempted to build mmlspark against spark 3, but enough things have been renamed/relocated that there is a pile of build errors [1].

Has anyone successfully built mmlspark against spark 3.0? If not, we may be able to get it patched up to build.

Thanks!

[1] - mmlspark/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/HTTPSourceV2.scala:26:37: object v2 is not a member of package org.apache.spark.sql.sources [error] import org.apache.spark.sql.sources.v2._

This entire v2 package has been renamed / relocated, with the bulk of it seemingly in this patch: https://github.com/apache/spark/commit/053dd858d38e6107bc71e0aa3a4954291b74f8c8

w1nk avatar Mar 04 '20 10:03 w1nk

👋 Thanks for opening your first issue here! If you're reporting a 🐞 bug, please make sure you include steps to reproduce it.

welcome[bot] avatar Mar 04 '20 10:03 welcome[bot]

Hello! Spark 3.0.0 was released about a week ago.

https://spark.apache.org/news/spark-3-0-0-released.html

Thanks!

w1nk avatar Jun 25 '20 09:06 w1nk

Does the MMLSpark team plan to migrate to (Py)Spark 3.0 in the near future?

fengkehh avatar Jul 07 '20 01:07 fengkehh

I also would be very interested in (Py)Spark 3.0.0 and Scala 2.12 support. How can we help?

brunocous avatar Aug 06 '20 12:08 brunocous

Just saw a related pending PR: https://github.com/Azure/mmlspark/pull/912

itechbear avatar Sep 04 '20 01:09 itechbear

Resolved by #970, I suppose?

juanpaulo avatar Feb 02 '21 04:02 juanpaulo

mmlspark doesn't work on Spark 3 yet, throws this error: https://github.com/Azure/mmlspark/issues/891

saikiranvadhi avatar Feb 15 '21 14:02 saikiranvadhi

latest mmlspark on master supports spark 3.0

imatiach-msft avatar Apr 19 '21 20:04 imatiach-msft

Is it possible to get it pushed to the maven repo so that we can install in into other clusters easily? (I created issue 1031 with this in mind but now see this one too.)

rgordon avatar Apr 26 '21 21:04 rgordon

hi @imatiach-msft Trying to get the latest mmlspark-master-build like you suggest, but getting an error:

Could not find com.microsoft.ml.spark:mmlspark_2.12:1.0.0-rc3-169-80889120-SNAPSHOT.
     Searched in the following locations:
     ...
       - https://mmlspark.azureedge.net/maven/com/microsoft/ml/spark/mmlspark_2.12/1.0.0-rc3-169-80889120-SNAPSHOT/maven-metadata.xml
       - https://mmlspark.azureedge.net/maven/com/microsoft/ml/spark/mmlspark_2.12/1.0.0-rc3-169-80889120-SNAPSHOT/mmlspark_2.12-1.0.0-rc3-169-80889120-SNAPSHOT.pom

Also tried the version from this comment com.microsoft.ml.spark:mmlspark_2.12:1.0.0-rc3-59-bf337941-SNAPSHOT, but got the same error.

Can you pls refine which version should i use?

avanunts avatar Sep 08 '21 11:09 avanunts