zipkin icon indicating copy to clipboard operation
zipkin copied to clipboard

table doesn't exist in 2.23.15

Open withinboredom opened this issue 3 years ago • 12 comments

Describe the Bug

A clear and concise description of what the bug is. If you have a solution in mind, skip raising an issue and open a pull request instead.

Started zipkin with an already existing cassandra instance, with CASSANDRA_ENSURE_SCHEMA. The logs indicate the migrations are run:


                  oo
                 oooo
                oooooo
               oooooooo
              oooooooooo
             oooooooooooo
           ooooooo  ooooooo
          oooooo     ooooooo
         oooooo       ooooooo
        oooooo   o  o   oooooo
       oooooo   oo  oo   oooooo
     ooooooo  oooo  oooo  ooooooo
    oooooo   ooooo  ooooo  ooooooo
   oooooo   oooooo  oooooo  ooooooo
  oooooooo      oo  oo      oooooooo
  ooooooooooooo oo  oo ooooooooooooo
      oooooooooooo  oooooooooooo
          oooooooo  oooooooo
              oooo  oooo

     ________ ____  _  _____ _   _
    |__  /_ _|  _ \| |/ /_ _| \ | |
      / / | || |_) | ' / | ||  \| |
     / /_ | ||  __/| . \ | || |\  |
    |____|___|_|   |_|\_\___|_| \_|

:: version 2.23.15 :: commit 63365bb ::

2021-12-19 20:15:59.014  INFO [/] 1 --- [oss-http-*:9411] c.l.a.s.Server                           : Serving HTTP at /[0:0:0:0:0:0:0:0%0]:9411 - http://127.0.0.1:9411/
2021-12-19 20:16:39.527  INFO [/] 1 --- [cking-tasks-1-1] z.s.c.Schema                             : Installing schema /zipkin2-schema.cql for keyspace zipkin2
2021-12-19 20:16:45.237  INFO [/] 1 --- [cking-tasks-1-1] z.s.c.Schema                             : Installing indexes /zipkin2-schema-indexes.cql for keyspace zipkin2
2021-12-19 20:16:48.649  INFO [/] 1 --- [ing-tasks-1-177] z.s.c.Schema                             : Upgrading schema /zipkin2-schema-upgrade-1.cql
2021-12-19 20:16:49.819  INFO [/] 1 --- [ing-tasks-1-177] z.s.c.Schema                             : Upgrading schema /zipkin2-schema-upgrade-2.cql
2021-12-19 20:16:52.123  WARN [/] 1 --- [orker-epoll-2-1] z.s.i.BodyIsExceptionMessage             : Unexpected error handling request.

But fails for some reason. Restarting zipkin doesn't result in it trying to recreate the schema again, so it just remains in a failure state and I'm not sure how to fix it.

Attempting to run any query results in the following error:

com.datastax.oss.driver.api.core.servererrors.InvalidQueryException: table span_by_service does not exist
        at com.datastax.oss.driver.api.core.servererrors.InvalidQueryException.copy(InvalidQueryException.java:48) ~[java-driver-core-4.11.3.jar:?]
        at com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149) ~[java-driver-core-4.11.3.jar:?]
        at com.datastax.oss.driver.internal.core.cql.CqlPrepareSyncProcessor.process(CqlPrepareSyncProcessor.java:59) ~[java-driver-core-4.11.3.jar:?]
        at com.datastax.oss.driver.internal.core.cql.CqlPrepareSyncProcessor.process(CqlPrepareSyncProcessor.java:31) ~[java-driver-core-4.11.3.jar:?]
        at com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:230) ~[java-driver-core-4.11.3.jar:?]
        at com.datastax.oss.driver.api.core.cql.SyncCqlSession.prepare(SyncCqlSession.java:224) ~[java-driver-core-4.11.3.jar:?]
        at zipkin2.storage.cassandra.SelectServiceNames$Factory.<init>(SelectServiceNames.java:34) ~[zipkin-storage-cassandra-2.23.15.jar:?]
        at zipkin2.storage.cassandra.CassandraSpanStore.<init>(CassandraSpanStore.java:86) ~[zipkin-storage-cassandra-2.23.15.jar:?]
        at zipkin2.storage.cassandra.CassandraStorage.spanStore(CassandraStorage.java:164) ~[zipkin-storage-cassandra-2.23.15.jar:?]
        at zipkin2.storage.cassandra.CassandraStorage.serviceAndSpanNames(CassandraStorage.java:176) ~[zipkin-storage-cassandra-2.23.15.jar:?]
        at zipkin2.server.internal.ZipkinQueryApiV2.getRemoteServiceNames(ZipkinQueryApiV2.java:117) ~[classes/:?]
        at com.linecorp.armeria.internal.server.annotation.AnnotatedService.invoke(AnnotatedService.java:391) ~[armeria-1.13.4.jar:?]
        at com.linecorp.armeria.internal.server.annotation.AnnotatedService.lambda$serve0$8(AnnotatedService.java:359) ~[armeria-1.13.4.jar:?]
        at java.util.concurrent.CompletableFuture$UniApply.tryFire(Unknown Source) ~[?:?]
        at java.util.concurrent.CompletableFuture$Completion.run(Unknown Source) ~[?:?]
        at com.linecorp.armeria.common.RequestContext.lambda$makeContextAware$3(RequestContext.java:547) ~[armeria-1.13.4.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
        at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.70.Final.jar:4.1.70.Final]
        at java.lang.Thread.run(Unknown Source) [?:?]

Steps to Reproduce

Steps to reproduce the behavior:

Start dockerized zipkin and point it to an existing cassandra instance. Here are my environment vars:

STORAGE_TYPE=cassandra3 CASSANDRA_ENSURE_SCHEMA=true CASSANDRA_CONTACT_POINTS=cassandra CASSANDRA_USERNAME=cassandra CASSANDRA_PASSWORD=password

Expected Behaviour

  1. Zipkin should be resilient and try to fix/continue a failed migration.
  2. Zipkin could output more helpful error messages when failing a migration.
  3. Failed migrations should crash Zipkin so operators know there is a problem.
  4. Migrations shouldn't fail.

withinboredom avatar Dec 19 '21 20:12 withinboredom

The same issue with version 2.23.19. Any news? It don't work with new installed cassandra cluster.

VuiDJi avatar Dec 02 '22 00:12 VuiDJi

I've hit the same issue with Zipkin 3.1 using Docker compose:

version: '3.2'

services:
  cassandra:
    image: cassandra:4.1.3
    environment:
    # Limit memory usage.
    - MAX_HEAP_SIZE=128M
    - HEAP_NEWSIZE=24M
  zipkin:
    image: openzipkin/zipkin:3.1
    ports:
      - 127.0.0.1:9411:9411
    environment:
      - STORAGE_TYPE=cassandra3
      - CASSANDRA_CONTACT_POINTS=cassandra

I've got the same issue with table span_by_service that don't exists and tried to manually create it in cqlsh:

cqlsh> CREATE TABLE IF NOT EXISTS zipkin2.span_by_service (
[...]
SyntaxException: Unknown property 'dclocal_read_repair_chance'

So it looks like Zipkin does not support Cassandra 4.1 (even if root README said "The Cassandra component [...] is tested against the latest patch of Cassandra 4.1").

Changing image: cassandra:4.1.3 to cassandra:3.11.9 fix my issue.

PierreF avatar Feb 28 '24 15:02 PierreF

We run tests with cassandra 4.1.4, but our test image pre-installs the schema. Maybe there is a glitch in schema auto-install.. https://github.com/openzipkin/zipkin/blob/master/docker/test-images/zipkin-cassandra/install.sh#L207

codefromthecrypt avatar Feb 28 '24 23:02 codefromthecrypt

Here are the integration tests, ITEnsureSchema. This is kicked off by ITCassandraStorageHeavy on every PR that affects java.

codefromthecrypt avatar Feb 28 '24 23:02 codefromthecrypt

The create table I've used is the one from https://github.com/openzipkin/zipkin/blob/9acae647cc9e6f28b0ff8429cb1883cfe33c628c/zipkin-storage/cassandra/src/main/resources/zipkin2-schema-indexes.cql#L52 I've assumed it the one used by auto-install and that the error was just hidden (I didn't saw it on the logs from zipkin container).

PierreF avatar Feb 28 '24 23:02 PierreF

OK I can reproduce the issue now. I think maybe we have to harden our tests for the mixed-case schema install thing.

version: '3'

services:
  cassandra:
    image: cassandra:4.1.3
    environment:
    # Limit memory usage.
    - MAX_HEAP_SIZE=128M
    - HEAP_NEWSIZE=24M
  zipkin:
    image: ghcr.io/openzipkin/zipkin:master
    command: --logging.level.zipkin2.storage.cassandra.Schema=INFO
    ports:
      - 127.0.0.1:9411:9411
    environment:
      - LOGGING_LEVEL_ZIPKIN2=INFO
      - STORAGE_TYPE=cassandra3
      - CASSANDRA_CONTACT_POINTS=cassandra

I don't get an error on schema install, rather afterwards when I post spans to it.

zipkin-1     | 2024-02-29T00:26:39.491Z  INFO [/] 1 --- [king-tasks-2-37] z.s.c.Schema                             : Detected Cassandra version 4.1.3
zipkin-1     | 2024-02-29T00:26:39.495Z  WARN [/] 1 --- [orker-epoll-3-1] z.s.i.BodyIsExceptionMessage             : Unexpected error handling request.
zipkin-1     | 
zipkin-1     | com.datastax.oss.driver.api.core.servererrors.InvalidQueryException: table span_by_service does not exist
zipkin-1     | 	at com.datastax.oss.driver.api.core.servererrors.InvalidQueryException.copy(InvalidQueryException.java:50) ~[java-driver-core-4.18.0.jar:4.18.0]
zipkin-1     | 	at com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:151) ~[java-driver-core-4.18.0.jar:4.18.0]
zipkin-1     | 	at com.datastax.oss.driver.internal.core.cql.CqlPrepareSyncProcessor.process(CqlPrepareSyncProcessor.java:61) ~[java-driver-core-4.18.0.jar:4.18.0]
zipkin-1     | 	at com.datastax.oss.driver.internal.core.cql.CqlPrepareSyncProcessor.process(CqlPrepareSyncProcessor.java:33) ~[java-driver-core-4.18.0.jar:4.18.0]
zipkin-1     | 	at com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:232) ~[java-driver-core-4.18.0.jar:4.18.0]
zipkin-1     | 	at com.datastax.oss.driver.api.core.cql.SyncCqlSession.prepare(SyncCqlSession.java:226) ~[java-driver-core-4.18.0.jar:4.18.0]
zipkin-1     | 	at zipkin2.storage.cassandra.SelectServiceNames$Factory.<init>(SelectServiceNames.java:34) ~[zipkin-storage-cassandra-3.1.1-SNAPSHOT.jar:?]
zipkin-1     | 	at zipkin2.storage.cassandra.CassandraSpanStore.<init>(CassandraSpanStore.java:86) ~[zipkin-storage-cassandra-3.1.1-SNAPSHOT.jar:?]
zipkin-1     | 	at zipkin2.storage.cassandra.CassandraStorage.spanStore(CassandraStorage.java:167) ~[zipkin-storage-cassandra-3.1.1-SNAPSHOT.jar:?]
zipkin-1     | 	at zipkin2.server.internal.ZipkinQueryApiV2.getTraces(ZipkinQueryApiV2.java:147) ~[classes/:?]
zipkin-1     | 	at com.linecorp.armeria.internal.server.annotation.AnnotatedService.invoke(AnnotatedService.java:382) ~[armeria-1.27.2.jar:?]
zipkin-1     | 	at com.linecorp.armeria.internal.server.annotation.AnnotatedService.lambda$serve1$8(AnnotatedService.java:351) ~[armeria-1.27.2.jar:?]
zipkin-1     | 	at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(Unknown Source) ~[?:?]
zipkin-1     | 	at java.base/java.util.concurrent.CompletableFuture$Completion.run(Unknown Source) ~[?:?]
zipkin-1     | 	at com.linecorp.armeria.common.DefaultContextAwareRunnable.run(DefaultContextAwareRunnable.java:45) ~[armeria-1.27.2.jar:?]
zipkin-1     | 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
zipkin-1     | 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
zipkin-1     | 	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:?]
zipkin-1     | 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
zipkin-1     | 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
zipkin-1     | 	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.106.Final.jar:4.1.106.Final]
zipkin-1     | 	at java.base/java.lang.Thread.run(Unknown Source) [?:?]

codefromthecrypt avatar Feb 29 '24 00:02 codefromthecrypt

So, this doesn't happen with our example image.. and I didn't see anything in cassandra logs about failing.. I wonder if the schema update isn't visible by default on the current connection? will dig deeper.. https://github.com/openzipkin/zipkin/tree/master/docker/examples

codefromthecrypt avatar Feb 29 '24 00:02 codefromthecrypt

ok it is because starting in cassandra 4, enable_sasi_indexes is disabled by default in the normal image. Things won't function without this setting. I'll look for a way to enable it via flag or worst case update the yaml in docs.

codefromthecrypt avatar Feb 29 '24 00:02 codefromthecrypt

looks like there's no way still to override cassandra settings without changing the yaml file https://github.com/docker-library/cassandra/pull/233#issuecomment-1022803135

codefromthecrypt avatar Feb 29 '24 00:02 codefromthecrypt

so it would be better to at least fail if cassandra is started without sasi-enabled, yet search is enabled

codefromthecrypt avatar Feb 29 '24 00:02 codefromthecrypt

The main issue is SASI is disabled by default in cassandra 4. So, and you need to find a way to enable that (probably bring your own yaml). If this is just for testing, of course you can use our image which already does this.

Currently, I don't know a way to detect if SASI is enabled at runtime, to create a better message when it isn't. Someone can teach me a way or better yet, contribute a way (I personally have never used cassandra in prod, I am just trying to help anyway)

Sadly there aren't any active cassandra experts on this project anymore. Plus, I have also failed to even get small patches to cassandra code landed. So, I would suggest contacting someone in cassandra ecosystem directly about what seems a very common problem that there's no way to enable SASI by default again without hassle. You will probably be told no (like everyone else) as they believe yaml should be the only way and properties disallowed :shrug:

codefromthecrypt avatar Feb 29 '24 01:02 codefromthecrypt

I think we can check this heuristically. Basically the schema silently fails, so we can double-check things that should install if SASI is enabled, but didn't https://github.com/openzipkin/zipkin/pull/3741

codefromthecrypt avatar Feb 29 '24 02:02 codefromthecrypt

@PierreF 3.1.1 on the way out will be explicit with the error in cassandra (swallowed by the healthcheck I think). The main thing is we cannot change the SASI settings as they are fixed and must be in the cassandra.yaml before cassandra starts https://github.com/openzipkin/zipkin/pull/3741

codefromthecrypt avatar Mar 07 '24 06:03 codefromthecrypt

It's good for me. The issue i had was that zipkin wasn't working with Cassandra 4 without visible reason. Now there is the error message with instruction to fix it.

Thank for your quick response.

PierreF avatar Mar 07 '24 08:03 PierreF