table doesn't exist in 2.23.15
Describe the Bug
A clear and concise description of what the bug is. If you have a solution in mind, skip raising an issue and open a pull request instead.
Started zipkin with an already existing cassandra instance, with CASSANDRA_ENSURE_SCHEMA. The logs indicate the migrations are run:
oo
oooo
oooooo
oooooooo
oooooooooo
oooooooooooo
ooooooo ooooooo
oooooo ooooooo
oooooo ooooooo
oooooo o o oooooo
oooooo oo oo oooooo
ooooooo oooo oooo ooooooo
oooooo ooooo ooooo ooooooo
oooooo oooooo oooooo ooooooo
oooooooo oo oo oooooooo
ooooooooooooo oo oo ooooooooooooo
oooooooooooo oooooooooooo
oooooooo oooooooo
oooo oooo
________ ____ _ _____ _ _
|__ /_ _| _ \| |/ /_ _| \ | |
/ / | || |_) | ' / | || \| |
/ /_ | || __/| . \ | || |\ |
|____|___|_| |_|\_\___|_| \_|
:: version 2.23.15 :: commit 63365bb ::
2021-12-19 20:15:59.014 INFO [/] 1 --- [oss-http-*:9411] c.l.a.s.Server : Serving HTTP at /[0:0:0:0:0:0:0:0%0]:9411 - http://127.0.0.1:9411/
2021-12-19 20:16:39.527 INFO [/] 1 --- [cking-tasks-1-1] z.s.c.Schema : Installing schema /zipkin2-schema.cql for keyspace zipkin2
2021-12-19 20:16:45.237 INFO [/] 1 --- [cking-tasks-1-1] z.s.c.Schema : Installing indexes /zipkin2-schema-indexes.cql for keyspace zipkin2
2021-12-19 20:16:48.649 INFO [/] 1 --- [ing-tasks-1-177] z.s.c.Schema : Upgrading schema /zipkin2-schema-upgrade-1.cql
2021-12-19 20:16:49.819 INFO [/] 1 --- [ing-tasks-1-177] z.s.c.Schema : Upgrading schema /zipkin2-schema-upgrade-2.cql
2021-12-19 20:16:52.123 WARN [/] 1 --- [orker-epoll-2-1] z.s.i.BodyIsExceptionMessage : Unexpected error handling request.
But fails for some reason. Restarting zipkin doesn't result in it trying to recreate the schema again, so it just remains in a failure state and I'm not sure how to fix it.
Attempting to run any query results in the following error:
com.datastax.oss.driver.api.core.servererrors.InvalidQueryException: table span_by_service does not exist
at com.datastax.oss.driver.api.core.servererrors.InvalidQueryException.copy(InvalidQueryException.java:48) ~[java-driver-core-4.11.3.jar:?]
at com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149) ~[java-driver-core-4.11.3.jar:?]
at com.datastax.oss.driver.internal.core.cql.CqlPrepareSyncProcessor.process(CqlPrepareSyncProcessor.java:59) ~[java-driver-core-4.11.3.jar:?]
at com.datastax.oss.driver.internal.core.cql.CqlPrepareSyncProcessor.process(CqlPrepareSyncProcessor.java:31) ~[java-driver-core-4.11.3.jar:?]
at com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:230) ~[java-driver-core-4.11.3.jar:?]
at com.datastax.oss.driver.api.core.cql.SyncCqlSession.prepare(SyncCqlSession.java:224) ~[java-driver-core-4.11.3.jar:?]
at zipkin2.storage.cassandra.SelectServiceNames$Factory.<init>(SelectServiceNames.java:34) ~[zipkin-storage-cassandra-2.23.15.jar:?]
at zipkin2.storage.cassandra.CassandraSpanStore.<init>(CassandraSpanStore.java:86) ~[zipkin-storage-cassandra-2.23.15.jar:?]
at zipkin2.storage.cassandra.CassandraStorage.spanStore(CassandraStorage.java:164) ~[zipkin-storage-cassandra-2.23.15.jar:?]
at zipkin2.storage.cassandra.CassandraStorage.serviceAndSpanNames(CassandraStorage.java:176) ~[zipkin-storage-cassandra-2.23.15.jar:?]
at zipkin2.server.internal.ZipkinQueryApiV2.getRemoteServiceNames(ZipkinQueryApiV2.java:117) ~[classes/:?]
at com.linecorp.armeria.internal.server.annotation.AnnotatedService.invoke(AnnotatedService.java:391) ~[armeria-1.13.4.jar:?]
at com.linecorp.armeria.internal.server.annotation.AnnotatedService.lambda$serve0$8(AnnotatedService.java:359) ~[armeria-1.13.4.jar:?]
at java.util.concurrent.CompletableFuture$UniApply.tryFire(Unknown Source) ~[?:?]
at java.util.concurrent.CompletableFuture$Completion.run(Unknown Source) ~[?:?]
at com.linecorp.armeria.common.RequestContext.lambda$makeContextAware$3(RequestContext.java:547) ~[armeria-1.13.4.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.70.Final.jar:4.1.70.Final]
at java.lang.Thread.run(Unknown Source) [?:?]
Steps to Reproduce
Steps to reproduce the behavior:
Start dockerized zipkin and point it to an existing cassandra instance. Here are my environment vars:
STORAGE_TYPE=cassandra3 CASSANDRA_ENSURE_SCHEMA=true CASSANDRA_CONTACT_POINTS=cassandra CASSANDRA_USERNAME=cassandra CASSANDRA_PASSWORD=password
Expected Behaviour
- Zipkin should be resilient and try to fix/continue a failed migration.
- Zipkin could output more helpful error messages when failing a migration.
- Failed migrations should crash Zipkin so operators know there is a problem.
- Migrations shouldn't fail.
The same issue with version 2.23.19. Any news? It don't work with new installed cassandra cluster.
I've hit the same issue with Zipkin 3.1 using Docker compose:
version: '3.2'
services:
cassandra:
image: cassandra:4.1.3
environment:
# Limit memory usage.
- MAX_HEAP_SIZE=128M
- HEAP_NEWSIZE=24M
zipkin:
image: openzipkin/zipkin:3.1
ports:
- 127.0.0.1:9411:9411
environment:
- STORAGE_TYPE=cassandra3
- CASSANDRA_CONTACT_POINTS=cassandra
I've got the same issue with table span_by_service that don't exists and tried to manually create it in cqlsh:
cqlsh> CREATE TABLE IF NOT EXISTS zipkin2.span_by_service (
[...]
SyntaxException: Unknown property 'dclocal_read_repair_chance'
So it looks like Zipkin does not support Cassandra 4.1 (even if root README said "The Cassandra component [...] is tested against the latest patch of Cassandra 4.1").
Changing image: cassandra:4.1.3 to cassandra:3.11.9 fix my issue.
We run tests with cassandra 4.1.4, but our test image pre-installs the schema. Maybe there is a glitch in schema auto-install.. https://github.com/openzipkin/zipkin/blob/master/docker/test-images/zipkin-cassandra/install.sh#L207
Here are the integration tests, ITEnsureSchema. This is kicked off by ITCassandraStorageHeavy on every PR that affects java.
The create table I've used is the one from https://github.com/openzipkin/zipkin/blob/9acae647cc9e6f28b0ff8429cb1883cfe33c628c/zipkin-storage/cassandra/src/main/resources/zipkin2-schema-indexes.cql#L52 I've assumed it the one used by auto-install and that the error was just hidden (I didn't saw it on the logs from zipkin container).
OK I can reproduce the issue now. I think maybe we have to harden our tests for the mixed-case schema install thing.
version: '3'
services:
cassandra:
image: cassandra:4.1.3
environment:
# Limit memory usage.
- MAX_HEAP_SIZE=128M
- HEAP_NEWSIZE=24M
zipkin:
image: ghcr.io/openzipkin/zipkin:master
command: --logging.level.zipkin2.storage.cassandra.Schema=INFO
ports:
- 127.0.0.1:9411:9411
environment:
- LOGGING_LEVEL_ZIPKIN2=INFO
- STORAGE_TYPE=cassandra3
- CASSANDRA_CONTACT_POINTS=cassandra
I don't get an error on schema install, rather afterwards when I post spans to it.
zipkin-1 | 2024-02-29T00:26:39.491Z INFO [/] 1 --- [king-tasks-2-37] z.s.c.Schema : Detected Cassandra version 4.1.3
zipkin-1 | 2024-02-29T00:26:39.495Z WARN [/] 1 --- [orker-epoll-3-1] z.s.i.BodyIsExceptionMessage : Unexpected error handling request.
zipkin-1 |
zipkin-1 | com.datastax.oss.driver.api.core.servererrors.InvalidQueryException: table span_by_service does not exist
zipkin-1 | at com.datastax.oss.driver.api.core.servererrors.InvalidQueryException.copy(InvalidQueryException.java:50) ~[java-driver-core-4.18.0.jar:4.18.0]
zipkin-1 | at com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:151) ~[java-driver-core-4.18.0.jar:4.18.0]
zipkin-1 | at com.datastax.oss.driver.internal.core.cql.CqlPrepareSyncProcessor.process(CqlPrepareSyncProcessor.java:61) ~[java-driver-core-4.18.0.jar:4.18.0]
zipkin-1 | at com.datastax.oss.driver.internal.core.cql.CqlPrepareSyncProcessor.process(CqlPrepareSyncProcessor.java:33) ~[java-driver-core-4.18.0.jar:4.18.0]
zipkin-1 | at com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:232) ~[java-driver-core-4.18.0.jar:4.18.0]
zipkin-1 | at com.datastax.oss.driver.api.core.cql.SyncCqlSession.prepare(SyncCqlSession.java:226) ~[java-driver-core-4.18.0.jar:4.18.0]
zipkin-1 | at zipkin2.storage.cassandra.SelectServiceNames$Factory.<init>(SelectServiceNames.java:34) ~[zipkin-storage-cassandra-3.1.1-SNAPSHOT.jar:?]
zipkin-1 | at zipkin2.storage.cassandra.CassandraSpanStore.<init>(CassandraSpanStore.java:86) ~[zipkin-storage-cassandra-3.1.1-SNAPSHOT.jar:?]
zipkin-1 | at zipkin2.storage.cassandra.CassandraStorage.spanStore(CassandraStorage.java:167) ~[zipkin-storage-cassandra-3.1.1-SNAPSHOT.jar:?]
zipkin-1 | at zipkin2.server.internal.ZipkinQueryApiV2.getTraces(ZipkinQueryApiV2.java:147) ~[classes/:?]
zipkin-1 | at com.linecorp.armeria.internal.server.annotation.AnnotatedService.invoke(AnnotatedService.java:382) ~[armeria-1.27.2.jar:?]
zipkin-1 | at com.linecorp.armeria.internal.server.annotation.AnnotatedService.lambda$serve1$8(AnnotatedService.java:351) ~[armeria-1.27.2.jar:?]
zipkin-1 | at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(Unknown Source) ~[?:?]
zipkin-1 | at java.base/java.util.concurrent.CompletableFuture$Completion.run(Unknown Source) ~[?:?]
zipkin-1 | at com.linecorp.armeria.common.DefaultContextAwareRunnable.run(DefaultContextAwareRunnable.java:45) ~[armeria-1.27.2.jar:?]
zipkin-1 | at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
zipkin-1 | at java.base/java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
zipkin-1 | at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:?]
zipkin-1 | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
zipkin-1 | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
zipkin-1 | at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.106.Final.jar:4.1.106.Final]
zipkin-1 | at java.base/java.lang.Thread.run(Unknown Source) [?:?]
So, this doesn't happen with our example image.. and I didn't see anything in cassandra logs about failing.. I wonder if the schema update isn't visible by default on the current connection? will dig deeper.. https://github.com/openzipkin/zipkin/tree/master/docker/examples
ok it is because starting in cassandra 4, enable_sasi_indexes is disabled by default in the normal image. Things won't function without this setting. I'll look for a way to enable it via flag or worst case update the yaml in docs.
looks like there's no way still to override cassandra settings without changing the yaml file https://github.com/docker-library/cassandra/pull/233#issuecomment-1022803135
so it would be better to at least fail if cassandra is started without sasi-enabled, yet search is enabled
The main issue is SASI is disabled by default in cassandra 4. So, and you need to find a way to enable that (probably bring your own yaml). If this is just for testing, of course you can use our image which already does this.
Currently, I don't know a way to detect if SASI is enabled at runtime, to create a better message when it isn't. Someone can teach me a way or better yet, contribute a way (I personally have never used cassandra in prod, I am just trying to help anyway)
Sadly there aren't any active cassandra experts on this project anymore. Plus, I have also failed to even get small patches to cassandra code landed. So, I would suggest contacting someone in cassandra ecosystem directly about what seems a very common problem that there's no way to enable SASI by default again without hassle. You will probably be told no (like everyone else) as they believe yaml should be the only way and properties disallowed :shrug:
I think we can check this heuristically. Basically the schema silently fails, so we can double-check things that should install if SASI is enabled, but didn't https://github.com/openzipkin/zipkin/pull/3741
@PierreF 3.1.1 on the way out will be explicit with the error in cassandra (swallowed by the healthcheck I think). The main thing is we cannot change the SASI settings as they are fixed and must be in the cassandra.yaml before cassandra starts https://github.com/openzipkin/zipkin/pull/3741
It's good for me. The issue i had was that zipkin wasn't working with Cassandra 4 without visible reason. Now there is the error message with instruction to fix it.
Thank for your quick response.