jaeger-operator
jaeger-operator copied to clipboard
[Bug]: Cassandra spark-dependencies seems to be broken
What happened?
When running the cassandra-spark
E2E test, the pod from the spark job fails:
k logs test-spark-deps-spark-dependencies-28508319-z4cmv
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/app/jaeger-spark-dependencies-0.0.1-SNAPSHOT.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Exception in thread "main" java.io.IOException: Failed to open native connection to Cassandra at {10.244.0.19}:9042
at com.datastax.spark.connector.cql.CassandraConnector$.createSession(CassandraConnector.scala:168)
at com.datastax.spark.connector.cql.CassandraConnector$.$anonfun$sessionCache$1(CassandraConnector.scala:154)
at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:32)
at com.datastax.spark.connector.cql.RefCountedCache.syncAcquire(RefCountedCache.scala:69)
at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:57)
at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:79)
at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:111)
at com.datastax.spark.connector.cql.CassandraConnector.withClusterDo(CassandraConnector.scala:122)
at com.datastax.spark.connector.cql.Schema$.fromCassandra(Schema.scala:332)
at com.datastax.spark.connector.cql.Schema$.tableFromCassandra(Schema.scala:352)
at com.datastax.spark.connector.rdd.CassandraTableRowReaderProvider.tableDef(CassandraTableRowReaderProvider.scala:50)
at com.datastax.spark.connector.rdd.CassandraTableRowReaderProvider.tableDef$(CassandraTableRowReaderProvider.scala:50)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.tableDef$lzycompute(CassandraTableScanRDD.scala:63)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.tableDef(CassandraTableScanRDD.scala:63)
at com.datastax.spark.connector.rdd.CassandraTableRowReaderProvider.verify(CassandraTableRowReaderProvider.scala:137)
at com.datastax.spark.connector.rdd.CassandraTableRowReaderProvider.verify$(CassandraTableRowReaderProvider.scala:136)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.verify(CassandraTableScanRDD.scala:63)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.getPartitions(CassandraTableScanRDD.scala:263)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:294)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:290)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:294)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:290)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:294)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:290)
at org.apache.spark.Partitioner$.$anonfun$defaultPartitioner$4(Partitioner.scala:78)
at org.apache.spark.Partitioner$.$anonfun$defaultPartitioner$4$adapted(Partitioner.scala:78)
at scala.collection.immutable.List.map(List.scala:293)
at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:78)
at org.apache.spark.rdd.PairRDDFunctions.$anonfun$groupByKey$6(PairRDDFunctions.scala:636)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:410)
at org.apache.spark.rdd.PairRDDFunctions.groupByKey(PairRDDFunctions.scala:636)
at org.apache.spark.api.java.JavaPairRDD.groupByKey(JavaPairRDD.scala:561)
at io.jaegertracing.spark.dependencies.cassandra.CassandraDependenciesJob.run(CassandraDependenciesJob.java:169)
at io.jaegertracing.spark.dependencies.DependenciesSparkJob.run(DependenciesSparkJob.java:60)
at io.jaegertracing.spark.dependencies.DependenciesSparkJob.main(DependenciesSparkJob.java:40)
Caused by: java.lang.NoClassDefFoundError: com/codahale/metrics/JmxReporter
at com.datastax.driver.core.Metrics.<init>(Metrics.java:146)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1501)
at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:451)
at com.datastax.spark.connector.cql.CassandraConnector$.createSession(CassandraConnector.scala:161)
... 41 more
Caused by: java.lang.ClassNotFoundException: com.codahale.metrics.JmxReporter
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(Unknown Source)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(Unknown Source)
at java.base/java.lang.ClassLoader.loadClass(Unknown Source)
... 45 more
Steps to reproduce
Run the test
Expected behavior
.
Relevant log output
No response
Screenshot
No response
Additional context
No response
Jaeger backend version
No response
SDK
No response
Pipeline
No response
Stogage backend
No response
Operating system
No response
Deployment model
No response
Deployment configs
No response
It seems something changed recently in the image and that is breaking the operator integration
same issues here, we use jaeger-operator v1.49.0
last successful spark job is 20 days ago
NAME COMPLETIONS DURATION AGE
jaeger-operator-jaeger-cassandra-schema-job 1/1 37s 601d
jaeger-operator-jaeger-spark-dependencies-28491835 1/1 32s 22d
jaeger-operator-jaeger-spark-dependencies-28493275 1/1 36s 21d
jaeger-operator-jaeger-spark-dependencies-28494715 1/1 37s 20d
jaeger-operator-jaeger-spark-dependencies-28523515 0/1 12h 12h
operator's spark job has no tag on the image, so "latest" is used as a fallback.
Containers:
jaeger-operator-jaeger-spark-dependencies:
Image: ghcr.io/jaegertracing/spark-dependencies/spark-dependencies
Port: <none>
Host Port: <none>
the image that is used can be found here.
https://github.com/jaegertracing/spark-dependencies/pkgs/container/spark-dependencies%2Fspark-dependencies/versions?filters%5Bversion_type%5D=tagged
sadly there is only 1 tag which was overwritten 20 days ago, that makes any Workaround impossible 😔
any ideas ? In my opinion at least the old image should be provided..
We might open an issue in that repository.
as a workaround, we pin the spark-dependencies to the old untaggt Image sha256:683963b95bafb0721f3261a49c368c7bdce4ddcb04a23116c45068d254c5ec11
we use the Helm Values of the jaeger-operator to override the DockerImage of dependencies
in storage
section:
jaeger:
create: true
spec:
strategy: production
storage:
type: cassandra
options:
cassandra:
servers: xxx
keyspace: jaeger
username: xxx
password: xxx
dependencies:
image: ghcr.io/jaegertracing/spark-dependencies/spark-dependencies@sha256:683963b95bafb0721f3261a49c368c7bdce4ddcb04a23116c45068d254c5ec11
However, the current image is broken and should be fixed. In my opinion, the jaeger-operator itself should also pin its own dependencies to avoid this kind of Errors in Production.
@rriverak would you like to send a PR?
@iblancasa I'm not sure. We can solve this on several levels... which solution are we looking for? then i cloud provide a PR accordingly.
-
Fix Spark dependencies The 'latest' image should work again with the operator and a better versioning should take place. After that adapt the new spark-dependency versioning to the operator-logic.
-
Adjust jaeger-operator The Image could be set strictly in the Flags as default. btw, the problem with 'latest' on Spark dependencies seems to be known
-
Adjust Helm Charts The Image could be set in the jaeger-operator-chart Values. https://github.com/jaegertracing/helm-charts/blob/d4e163e1311df2596aabb7db1aedc05169a71396/charts/jaeger-operator/values.yaml#L40 and in the jaeger-chart Values https://github.com/jaegertracing/helm-charts/blob/d4e163e1311df2596aabb7db1aedc05169a71396/charts/jaeger/values.yaml#L737
What is our path?
I would be happy if spark-dependency shows initiative here and fixes the problems with the Image and switches to a proper Versioning. If this does not happen, then one of the remaining two solutions must do the job.
I would prefer the Fix Spark dependencies
option. After that one, set the version in the Jaeger operator. The third one is not a real solution since a lot of people are not using Helm to install the operator.