ecosystem Change spark-tensorflow-connector dependency to be spark 3.0.0 preview

Change spark-tensorflow-connector to be spark-3.0.0-preview2 Test:

cd $PROJ_HOME/hadoop
mvn clean install  # build tensorflow-hadoop:1.10.0 and install into local repo

cd $PROJ_HOME/spark/spark-tensorflow-connector
mvn clean install

Oct 09 '19 03:10 WeichenXu123

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

:memo: Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.

What to do if you already signed the CLA

Individual signers

It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.

Corporate signers

Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
The email used to register you as an authorized contributor must also be attached to your GitHub account.

ℹ️ Googlers: Go here for more info.

Oct 09 '19 03:10 googlebot

@googlebot I signed it!

Oct 09 '19 04:10 WeichenXu123

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

Oct 09 '19 04:10 googlebot

@jhseu Could you help review it ? Thanks! Just run mvn clean install under directory spark/spark-tensorflow-connector to verify PR correctness. Btw why there's no jenkins test ?

Oct 09 '19 04:10 WeichenXu123

When I try to build this, I'm hitting:

[ERROR] Failed to execute goal on project spark-tensorflow-connector_2.11: Could not resolve dependencies for project org.tensorflow:spark-tensorflow-connector_2.11:jar:1.10.0: Could not find artifact org.tensorflow:tensorflow-hadoop:jar:1.10.0 in central (https://repo.maven.apache.org/maven2) -> [Help 1]

It looks like this tries to get a tensorflow-hadoop version which matches the spark-tensorflow-connector version. Is that intentional (given that tensorflow-hadoop is on version 1.14.0, whereas spark-tensorflow-connector is on version 1.10.0)?

Oct 10 '19 18:10 jkbradley

@jkbradley Yes, the project version is 1.10, so it will depend on tensorflow-hadoop:1.10.0 version.

The default maven repo only include tensorflow-hadoop version >= 1.11, so we should enter hadoop directory to build it first, command is:

cd $PROJ_HOME/hadoop
mvn clean install  # build tensorflow-hadoop:1.10.0 and install into local repo

cd $PROJ_HOME/spark/spark-tensorflow-connector
mvn clean install

Oct 11 '19 14:10 WeichenXu123

Whoops, my bad, did not realize it's in the same project & is a manually handled dependency. Thanks!

Oct 11 '19 22:10 jkbradley

Since this project's CI isn't running, I tested this PR locally. It may have some flakiness in the impl or tests right now. I ran the tests once (mvn clean install) and hit the following failure. But then I ran them again (mvn test) & they passed. I ran a 3rd time (mvn clean install) and they passed.

Failure in LocalWriteSuite:

- should write data locally *** FAILED ***
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in stage 0.0 (TID 1, c02w81rbhtd5.attlocal.net, executor driver): java.lang.IllegalStateException: LocalPath /var/folders/y_/_46df7ns1cn8dj_6hrs2fdxm0000gp/T/spark-connector-propagate2230735357410018221 already exists. SaveMode: ErrorIfExists.
	at org.tensorflow.spark.datasources.tfrecords.DefaultSource$.writePartitionLocal(DefaultSource.scala:182)
	at org.tensorflow.spark.datasources.tfrecords.DefaultSource$.mapFun$1(DefaultSource.scala:212)
	at org.tensorflow.spark.datasources.tfrecords.DefaultSource$.$anonfun$writePartitionLocalFun$1(DefaultSource.scala:214)
	at org.tensorflow.spark.datasources.tfrecords.DefaultSource$.$anonfun$writePartitionLocalFun$1$adapted(DefaultSource.scala:214)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:889)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:889)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:127)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:455)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:458)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
  at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:1979)
  at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:1967)
  at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:1966)
  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
  at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1966)
  at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:946)
  at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:946)
  at scala.Option.foreach(Option.scala:407)
  ...
  Cause: java.lang.IllegalStateException: LocalPath /var/folders/y_/_46df7ns1cn8dj_6hrs2fdxm0000gp/T/spark-connector-propagate2230735357410018221 already exists. SaveMode: ErrorIfExists.
  at org.tensorflow.spark.datasources.tfrecords.DefaultSource$.writePartitionLocal(DefaultSource.scala:182)
  at org.tensorflow.spark.datasources.tfrecords.DefaultSource$.mapFun$1(DefaultSource.scala:212)
  at org.tensorflow.spark.datasources.tfrecords.DefaultSource$.$anonfun$writePartitionLocalFun$1(DefaultSource.scala:214)
  at org.tensorflow.spark.datasources.tfrecords.DefaultSource$.$anonfun$writePartitionLocalFun$1$adapted(DefaultSource.scala:214)
  at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:889)
  at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:889)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
  ...

Also, one nit: the name of the artifact in the pom should be updated to 2.12: spark-tensorflow-connector_2.11

Oct 15 '19 18:10 jkbradley

I'm not opposed to this, but wouldn't it be better to wait until Spark 3.0.0 is released?

Oct 15 '19 22:10 jhseu

@jhseu After we verify correctness, we can keep this PR open so less work for users who want to try out Spark 3.0 preview with spark-tensorflow-connector.

Oct 15 '19 22:10 mengxr

Yeah, I don't mind keeping this open.

Oct 15 '19 23:10 jhseu

@jkbradley Flaky test fixed. You could retest it. And pom artifact is updated to 2_12.

Oct 16 '19 09:10 WeichenXu123

@WeichenXu123 Could you explain the test flakiness? Is it relevant to Spark 3.0 upgrade? If not, let's submit another PR so the fix can go in.

Oct 16 '19 16:10 mengxr

@mengxr Not relevant to spark 3.0. Create new PR here with some explanation https://github.com/tensorflow/ecosystem/pull/144

Oct 17 '19 02:10 WeichenXu123

@jhseu If we do not plan to make a new release that is 2.4 compatible, shall we review and merge this PR?

Mar 31 '20 14:03 mengxr

Hi, we would like to use this library with spark 2.4 and scala 2.12.10. Would it be possible to support multiple versions with multiple profiles? I should probably create an issue but just wanted to ask here as well.

Apr 24 '20 07:04 vikatskhay

Now Spark 3.0.0 is released. And we need https://mvnrepository.com/artifact/org.tensorflow/spark-tensorflow-connector_2.12 to be released, I think.

Jun 22 '20 09:06 kangnak

ecosystem ecosystem copied to clipboard

Change spark-tensorflow-connector dependency to be spark 3.0.0 preview

What to do if you already signed the CLA

Individual signers

Corporate signers

ecosystem
ecosystem copied to clipboard