ecosystem
ecosystem copied to clipboard
Change spark-tensorflow-connector dependency to be spark 3.0.0 preview
Change spark-tensorflow-connector to be spark-3.0.0-preview2 Test:
cd $PROJ_HOME/hadoop
mvn clean install # build tensorflow-hadoop:1.10.0 and install into local repo
cd $PROJ_HOME/spark/spark-tensorflow-connector
mvn clean install
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).
:memo: Please visit https://cla.developers.google.com/ to sign.
Once you've signed (or fixed any issues), please reply here with @googlebot I signed it!
and we'll verify it.
What to do if you already signed the CLA
Individual signers
- It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.
Corporate signers
- Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
- The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
- The email used to register you as an authorized contributor must also be attached to your GitHub account.
ℹ️ Googlers: Go here for more info.
@googlebot I signed it!
@jhseu Could you help review it ? Thanks!
Just run mvn clean install
under directory spark/spark-tensorflow-connector to verify PR correctness.
Btw why there's no jenkins test ?
When I try to build this, I'm hitting:
[ERROR] Failed to execute goal on project spark-tensorflow-connector_2.11: Could not resolve dependencies for project org.tensorflow:spark-tensorflow-connector_2.11:jar:1.10.0: Could not find artifact org.tensorflow:tensorflow-hadoop:jar:1.10.0 in central (https://repo.maven.apache.org/maven2) -> [Help 1]
It looks like this tries to get a tensorflow-hadoop version which matches the spark-tensorflow-connector version. Is that intentional (given that tensorflow-hadoop is on version 1.14.0, whereas spark-tensorflow-connector is on version 1.10.0)?
@jkbradley Yes, the project version is 1.10, so it will depend on tensorflow-hadoop:1.10.0 version.
The default maven repo only include tensorflow-hadoop version >= 1.11,
so we should enter hadoop
directory to build it first, command is:
cd $PROJ_HOME/hadoop
mvn clean install # build tensorflow-hadoop:1.10.0 and install into local repo
cd $PROJ_HOME/spark/spark-tensorflow-connector
mvn clean install
Whoops, my bad, did not realize it's in the same project & is a manually handled dependency. Thanks!
Since this project's CI isn't running, I tested this PR locally. It may have some flakiness in the impl or tests right now. I ran the tests once (mvn clean install) and hit the following failure. But then I ran them again (mvn test) & they passed. I ran a 3rd time (mvn clean install) and they passed.
Failure in LocalWriteSuite:
- should write data locally *** FAILED ***
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in stage 0.0 (TID 1, c02w81rbhtd5.attlocal.net, executor driver): java.lang.IllegalStateException: LocalPath /var/folders/y_/_46df7ns1cn8dj_6hrs2fdxm0000gp/T/spark-connector-propagate2230735357410018221 already exists. SaveMode: ErrorIfExists.
at org.tensorflow.spark.datasources.tfrecords.DefaultSource$.writePartitionLocal(DefaultSource.scala:182)
at org.tensorflow.spark.datasources.tfrecords.DefaultSource$.mapFun$1(DefaultSource.scala:212)
at org.tensorflow.spark.datasources.tfrecords.DefaultSource$.$anonfun$writePartitionLocalFun$1(DefaultSource.scala:214)
at org.tensorflow.spark.datasources.tfrecords.DefaultSource$.$anonfun$writePartitionLocalFun$1$adapted(DefaultSource.scala:214)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:889)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:889)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:455)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:458)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:1979)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:1967)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:1966)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1966)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:946)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:946)
at scala.Option.foreach(Option.scala:407)
...
Cause: java.lang.IllegalStateException: LocalPath /var/folders/y_/_46df7ns1cn8dj_6hrs2fdxm0000gp/T/spark-connector-propagate2230735357410018221 already exists. SaveMode: ErrorIfExists.
at org.tensorflow.spark.datasources.tfrecords.DefaultSource$.writePartitionLocal(DefaultSource.scala:182)
at org.tensorflow.spark.datasources.tfrecords.DefaultSource$.mapFun$1(DefaultSource.scala:212)
at org.tensorflow.spark.datasources.tfrecords.DefaultSource$.$anonfun$writePartitionLocalFun$1(DefaultSource.scala:214)
at org.tensorflow.spark.datasources.tfrecords.DefaultSource$.$anonfun$writePartitionLocalFun$1$adapted(DefaultSource.scala:214)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:889)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:889)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
...
Also, one nit: the name of the artifact in the pom should be updated to 2.12: spark-tensorflow-connector_2.11
I'm not opposed to this, but wouldn't it be better to wait until Spark 3.0.0 is released?
@jhseu After we verify correctness, we can keep this PR open so less work for users who want to try out Spark 3.0 preview with spark-tensorflow-connector.
Yeah, I don't mind keeping this open.
@jkbradley Flaky test fixed. You could retest it. And pom artifact is updated to 2_12.
@WeichenXu123 Could you explain the test flakiness? Is it relevant to Spark 3.0 upgrade? If not, let's submit another PR so the fix can go in.
@mengxr Not relevant to spark 3.0. Create new PR here with some explanation https://github.com/tensorflow/ecosystem/pull/144
@jhseu If we do not plan to make a new release that is 2.4 compatible, shall we review and merge this PR?
Hi, we would like to use this library with spark 2.4 and scala 2.12.10. Would it be possible to support multiple versions with multiple profiles? I should probably create an issue but just wanted to ask here as well.
Now Spark 3.0.0 is released. And we need https://mvnrepository.com/artifact/org.tensorflow/spark-tensorflow-connector_2.12 to be released, I think.