spark-operator icon indicating copy to clipboard operation
spark-operator copied to clipboard

Sparkapplication stuck forever?

Open AlejandroUPC opened this issue 1 year ago • 1 comments

Just had a spark application that connects to some streaming service and consumes data, but the sparkapplication is stuck without sate for a too long time?

NAME            STATUS   ATTEMPTS   START   FINISH   AGE
**redacted**                                        5m16s

When checking the driver logs, all I see:

I1108 08:51:43.924523      10 controller.go:184] SparkApplication **readacted**/**redacted** was added, enqueuing it for submission

No pod is being created, other than the operator one and I am completely blind here, how can I debug this?

Thanks

Edit: After a while it crashed but the message error is just showing warnings?

failed to run spark-submit for SparkApplication **redacted**/**redacted**: 
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
https://repo1.maven.org/ added as a remote repository with the name: repo-1
Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
com.microsoft.azure#azure-eventhubs-spark_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-a29b89a6-57af-47bc-a8c0-3e49d685c8f7;1.0
	confs: [default]
	found com.microsoft.azure#azure-eventhubs-spark_2.12;2.3.22 in central
	found com.microsoft.azure#azure-eventhubs;3.3.0 in central
	found org.apache.qpid#proton-j;0.33.8 in central
	found com.microsoft.azure#qpid-proton-j-extensions;1.2.4 in central
	found org.slf4j#slf4j-api;1.7.30 in central
	found com.microsoft.azure#azure-client-authentication;1.7.3 in central
	found com.microsoft.azure#azure-client-runtime;1.7.3 in central
	found com.microsoft.rest#client-runtime;1.7.3 in central
	found com.google.guava#guava;24.1.1-jre in central
	found com.google.code.findbugs#jsr305;1.3.9 in central
	found org.checkerframework#checker-compat-qual;2.0.0 in central
	found com.google.errorprone#error_prone_annotations;2.1.3 in central
	found com.google.j2objc#j2objc-annotations;1.1 in central
	found org.codehaus.mojo#animal-sniffer-annotations;1.14 in central
	found com.squareup.retrofit2#retrofit;2.7.2 in central
	found com.squareup.okhttp3#okhttp;3.12.6 in central
	found com.squareup.okio#okio;1.15.0 in central
	found com.squareup.okhttp3#logging-interceptor;3.12.2 in central
	found com.squareup.okhttp3#okhttp-urlconnection;3.12.2 in central
	found com.squareup.retrofit2#converter-jackson;2.7.2 in central
	found com.fasterxml.jackson.core#jackson-databind;2.10.1 in central
	found com.fasterxml.jackson.core#jackson-annotations;2.10.1 in central
	found com.fasterxml.jackson.core#jackson-core;2.10.1 in central
	found com.fasterxml.jackson.datatype#jackson-datatype-joda;2.10.1 in central
	found joda-time#joda-time;2.9.9 in central
	found org.apache.commons#commons-lang3;3.4 in central
	found io.reactivex#rxjava;1.3.8 in central
	found com.squareup.retrofit2#adapter-rxjava;2.7.2 in central
	found com.microsoft.azure#azure-annotations;1.10.0 in central
	found commons-codec#commons-codec;1.11 in central
	found com.microsoft.azure#adal4j;1.6.4 in central
	found com.nimbusds#oauth2-oidc-sdk;6.5 in central
	found com.sun.mail#javax.mail;1.6.1 in central
	found javax.activation#activation;1.1 in central
	found com.github.stephenc.jcip#jcip-annotations;1.0-1 in central
	found net.minidev#json-smart;2.3 in central
	[2.3] net.minidev#json-smart;[1.3.1,2.3]
	found net.minidev#accessors-smart;1.2 in central
	found org.ow2.asm#asm;5.0.4 in central
	found com.nimbusds#lang-tag;1.7 in central
	[1.7] com.nimbusds#lang-tag;[1.4.3,)
	found com.google.code.gson#gson;2.8.0 in central
	found com.nimbusds#nimbus-jose-jwt;9.8.1 in central
	found org.scala-lang.modules#scala-java8-compat_2.12;0.9.0 in central
:: resolution report :: resolve 44200ms :: artifacts dl 2200ms
	:: modules in use:
	com.fasterxml.jackson.core#jackson-annotations;2.10.1 from central in [default]
	com.fasterxml.jackson.core#jackson-core;2.10.1 from central in [default]
	com.fasterxml.jackson.core#jackson-databind;2.10.1 from central in [default]
	com.fasterxml.jackson.datatype#jackson-datatype-joda;2.10.1 from central in [default]
	com.github.stephenc.jcip#jcip-annotations;1.0-1 from central in [default]
	com.google.code.findbugs#jsr305;1.3.9 from central in [default]
	com.google.code.gson#gson;2.8.0 from central in [default]
	com.google.errorprone#error_prone_annotations;2.1.3 from central in [default]
	com.google.guava#guava;24.1.1-jre from central in [default]
	com.google.j2objc#j2objc-annotations;1.1 from central in [default]
	com.microsoft.azure#adal4j;1.6.4 from central in [default]
	com.microsoft.azure#azure-annotations;1.10.0 from central in [default]
	com.microsoft.azure#azure-client-authentication;1.7.3 from central in [default]
	com.microsoft.azure#azure-client-runtime;1.7.3 from central in [default]
	com.microsoft.azure#azure-eventhubs;3.3.0 from central in [default]
	com.microsoft.azure#azure-eventhubs-spark_2.12;2.3.22 from central in [default]
	com.microsoft.azure#qpid-proton-j-extensions;1.2.4 from central in [default]
	com.microsoft.rest#client-runtime;1.7.3 from central in [default]
	com.nimbusds#lang-tag;1.7 from central in [default]
	com.nimbusds#nimbus-jose-jwt;9.8.1 from central in [default]
	com.nimbusds#oauth2-oidc-sdk;6.5 from central in [default]
	com.squareup.okhttp3#logging-interceptor;3.12.2 from central in [default]
	com.squareup.okhttp3#okhttp;3.12.6 from central in [default]
	com.squareup.okhttp3#okhttp-urlconnection;3.12.2 from central in [default]
	com.squareup.okio#okio;1.15.0 from central in [default]
	com.squareup.retrofit2#adapter-rxjava;2.7.2 from central in [default]
	com.squareup.retrofit2#converter-jackson;2.7.2 from central in [default]
	com.squareup.retrofit2#retrofit;2.7.2 from central in [default]
	com.sun.mail#javax.mail;1.6.1 from central in [default]
	commons-codec#commons-codec;1.11 from central in [default]
	io.reactivex#rxjava;1.3.8 from central in [default]
	javax.activation#activation;1.1 from central in [default]
	joda-time#joda-time;2.9.9 from central in [default]
	net.minidev#accessors-smart;1.2 from central in [default]
	net.minidev#json-smart;2.3 from central in [default]
	org.apache.commons#commons-lang3;3.4 from central in [default]
	org.apache.qpid#proton-j;0.33.8 from central in [default]
	org.checkerframework#checker-compat-qual;2.0.0 from central in [default]
	org.codehaus.mojo#animal-sniffer-annotations;1.14 from central in [default]
	org.ow2.asm#asm;5.0.4 from central in [default]
	org.scala-lang.modules#scala-java8-compat_2.12;0.9.0 from central in [default]
	org.slf4j#slf4j-api;1.7.30 from central in [default]
	:: evicted modules:
	org.slf4j#slf4j-api;1.7.28 by [org.slf4j#slf4j-api;1.7.30] in [default]
	org.slf4j#slf4j-api;1.7.22 by [org.slf4j#slf4j-api;1.7.30] in [default]
	com.nimbusds#nimbus-jose-jwt;[6.0.1,) by [com.nimbusds#nimbus-jose-jwt;9.8.1] in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   45  |   2   |   0   |   3   ||   42  |   0   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-a29b89a6-57af-47bc-a8c0-3e49d685c8f7
	confs: [default]
	0 artifacts copied, 42 already retrieved (0kB/600ms)
23/11/08 07:55:05 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
23/11/08 07:55:10 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
23/11/08 07:55:19 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image.
23/11/08 07:55:21 WARN DriverCommandFeatureStep: spark.kubernetes.pyspark.pythonVersion was deprecated in Spark 3.1. Please set 'spark.pyspark.python' and 'spark.pyspark.driver.python' configurations or PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON environment variables instead.

AlejandroUPC avatar Nov 08 '23 08:11 AlejandroUPC

I have the same issue on kubernetes v28.

JavadHosseini avatar Mar 09 '24 15:03 JavadHosseini

I have faced the same problem. In my case, it seems likely that there're several SparkApplication having same name are submitted in same time. You should check the spark operator pods' logs for more information.

voducdan avatar Jul 23 '24 08:07 voducdan