isolation-forest icon indicating copy to clipboard operation
isolation-forest copied to clipboard

Issue writing in synapse spark 3.2

Open siege089 opened this issue 5 months ago • 3 comments

I'm using azure synapse and nothing I'm doing is allowing me to write models. I've explicitly included spark-avro in my pom file and loaded the spark-avro package into the spark pool workspace.

    <properties>
        <spark.version>3.2.0</spark.version>
        <scala.version.major>2.12</scala.version.major>
        <scala.version.minor>15</scala.version.minor>
    </properties>
    <dependencies>
        <dependency>
            <groupId>com.linkedin.isolation-forest</groupId>
            <artifactId>isolation-forest_${spark.version}_${scala.version.major}</artifactId>
            <version>3.0.3</version>
        </dependency>

        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.version.major}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala.version.major}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-mllib_${scala.version.major}</artifactId>
            <version>${spark.version}</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-avro_${scala.version.major}</artifactId>
            <version>${spark.version}</version>
        </dependency>

        <dependency>
            <groupId>com.microsoft.azure.synapse</groupId>
            <artifactId>synapseutils_${scala.version.major}</artifactId>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.jmockit</groupId>
            <artifactId>jmockit</artifactId>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.scalatest</groupId>
            <artifactId>scalatest_${scala.version.major}</artifactId>
        </dependency>
    </dependencies>
2024-01-30 01:31:47,163 INFO ApplicationMaster [shutdown-hook-0]: Unregistering ApplicationMaster with FAILED (diag message: User class threw exception: org.apache.spark.sql.AnalysisException:  Failed to find data source: com.databricks.spark.avro. Avro is built-in but external data source module since Spark 2.4. Please deploy the application as per the deployment section of "Apache Avro Data Source Guide".        
	at org.apache.spark.sql.errors.QueryCompilationErrors$.failedToFindAvroDataSourceError(QueryCompilationErrors.scala:1028)
	at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:666)
	at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:720)
	at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:876)
	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:275)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:241)
	at com.linkedin.relevance.isolationforest.IsolationForestModelReadWrite$IsolationForestModelWriter.saveImplHelper(IsolationForestModelReadWrite.scala:262)
	at com.linkedin.relevance.isolationforest.IsolationForestModelReadWrite$IsolationForestModelWriter.saveImpl(IsolationForestModelReadWrite.scala:241)
	at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:168)

siege089 avatar Feb 01 '24 18:02 siege089