Andrei Neagoe
Andrei Neagoe
Seems to be due to leftover cgroups. It's not clear for me which component is responsible for properly cleaning up pod cgroups on teardown (though I'd imagine it's crio). More...
@lbogdan thanks for the insight. `kubelet` doesn't seem to mind if the slice is also cleaned up and I wasn't sure if it's going to clean it up after the...
@roligupt the documentation is updated, the way to go is as per https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/examples/spark-pi-configmap.yaml example.
@gaozhenhai do you have `MutatingWebhook` configured? In my testing, this worked as expected. It is also confirmed by @roligupt in his original post. With the following manifest for `SparkApplication`, as...
@RonZhang724 unfortunately the tests seem to be a bit flaky. I've run the new target local-integration-test in a loop and it works fine. However, under the github workflow, it trips...
Please double-check permissions of the files to ensure spark can actually read them. As well, try out with the spark-pi example: ``` image: gcr.io/spark-operator/spark:v3.1.1 imagePullPolicy: Always mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar mainClass: org.apache.spark.examples.SparkPi...
I doubt it has anything to do with the operator; this is just an orchestrator putting together the resources based on the spec you give it. It doesn't really care...
So on both clusters you're using the exact same images? Please also check logs from both operators and describe both driver pods to ensure the same version is used.
It would be great if you could provide a way to reproduce this with a generic application, so anyone can test and help troubleshoot. Yes, it's fine to use operator...
So... running operator with image `v1beta2-1.3.3-3.1.1` and specifying 2.4.5 image for the driver and worker pods works fine? That would absolve the operator of any suspicion... Perhaps it's worth trying...