spark-bigquery-connector
spark-bigquery-connector copied to clipboard
Unnecessary dependency to spark-mlllib
In order to support SparkML types of "vector" and "matrix", the SupportedCustomDataType
enum is added which has a reference to spark-mllib library. For a code that is using only core
and sql
, my situation, I don't feel it is necessary to add spark-mllib
library.
We could avoid it by adding a helper to this enum so that the check for the field types to see if they are "vector" or "matrix" is done outside of the class.
Here is the exception that I get when running the save
;
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/ml/linalg/SQLDataTypes
at com.google.cloud.spark.bigquery.SupportedCustomDataType.<clinit>(SupportedCustomDataType.java:25)
at com.google.cloud.spark.bigquery.BigQueryWriteHelper.$anonfun$updateMetadataIfNeeded$1(BigQueryWriteHelper.scala:96)
at com.google.cloud.spark.bigquery.BigQueryWriteHelper.$anonfun$updateMetadataIfNeeded$1$adapted(BigQueryWriteHelper.scala:95)
at scala.collection.TraversableLike.$anonfun$filterImpl$1(TraversableLike.scala:304)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
Created a PR https://github.com/GoogleCloudDataproc/spark-bigquery-connector/pull/601