zipkin icon indicating copy to clipboard operation
zipkin copied to clipboard

Zipkin with ScyllaDB

Open mkorolyov opened this issue 1 year ago • 4 comments

Hi. I've tried to run Zipkin with ScyllaDB with cassandra3 storage_type. As ScyllaDB is a drop-in replacement for Cassandra most of the API calls should work as is. Nevertheless Zipkin failed to run properly.

I used this docker-compose.yml

version: "3"

networks:
  zipkin:
    driver: bridge

services:
  zipkin:
    restart: always
    image: openzipkin/zipkin:latest
    depends_on:
      - scylladb
    ports:
      - 9411:9411
    networks:
      - zipkin
    environment:
        - STORAGE_TYPE=cassandra3
        - CASSANDRA_CONTACT_POINTS=scylladb
        - CASSANDRA_KEYSPACE=zipkin2
        - CASSANDRA_LOCAL_DC=datacenter1
        - CASSANDRA_ENSURE_SCHEMA=true

  scylladb:
    restart: always
    image: scylladb/scylla:latest
    ports:
      - 9042:9042
    volumes:
      - .docker/scylladb/1:/var/lib/scylla
    networks:
      - zipkin
    healthcheck:
      test: ["CMD", "cqlsh", "-e", "describe keyspaces"]
      interval: 1s
      retries: 120
      timeout: 1s

Which starts ScyllaDB node and Zipkin. After the start Zipkin from time to time logs message, that he is trying to ensure schema:

zipkin-zipkin-1  | 2023-09-10 22:03:21.140  INFO [/] 1 --- [cking-tasks-1-1] z.s.c.Schema                             : Installing schema /zipkin2-schema.cql for keyspace zipkin2
zipkin-zipkin-1  | 2023-09-10 22:03:28.372  INFO [/] 1 --- [cking-tasks-1-1] z.s.c.Schema                             : Installing schema /zipkin2-schema.cql for keyspace zipkin2
zipkin-zipkin-1  | 2023-09-10 22:03:30.462  INFO [/] 1 --- [cking-tasks-1-2] z.s.c.Schema                             : Installing schema /zipkin2-schema.cql for keyspace zipkin2

I've tried to send some sample trace to zipkin via curl

curl -vvv -X POST \
  http://localhost:9411/api/v2/spans \
  -H 'Content-Type: application/json' \
  -d '[
    {
      "traceId": "1234567890abcdef",
      "id": "1234567890abcdef",
      "name": "example-service",
      "timestamp": 1629852000000000,
      "duration": 1000000,
      "localEndpoint": {
        "serviceName": "service-A",
        "ipv4": "127.0.0.1"
      },
      "kind": "SERVER",
      "tags": {
        "http.method": "GET",
        "http.path": "/api/resource"
      }
    }
  ]'
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 127.0.0.1:9411...
* Connected to localhost (127.0.0.1) port 9411 (#0)
> POST /api/v2/spans HTTP/1.1
> Host: localhost:9411
> User-Agent: curl/8.1.2
> Accept: */*
> Content-Type: application/json
> Content-Length: 396
>
< HTTP/1.1 202 Accepted
< content-length: 0
< server: Armeria/1.17.2
< date: Sun, 10 Sep 2023 22:04:22 GMT
<
* Connection #0 to host localhost left intact

There where no errors in Zipkin logs. Then I've tried to open Zipkin UI Screenshot 2023-09-11 at 00 13 58 And checked the logs, where was and error

zipkin-zipkin-1  | 2023-09-10 22:03:32.497  WARN [/] 1 --- [-worker-nio-2-3] z.s.i.BodyIsExceptionMessage             : Unexpected error handling request.
zipkin-zipkin-1  |
zipkin-zipkin-1  | java.lang.RuntimeException: Node 29be6b37-e669-4125-82c9-3bc3436072cb is running Cassandra 3.0.8, but minimum version is 3.11.3
zipkin-zipkin-1  | 	at zipkin2.storage.cassandra.Schema.ensureVersion(Schema.java:99) ~[zipkin-storage-cassandra-2.24.3.jar:?]
zipkin-zipkin-1  | 	at zipkin2.storage.cassandra.Schema.applyCqlFile(Schema.java:140) ~[zipkin-storage-cassandra-2.24.3.jar:?]
zipkin-zipkin-1  | 	at zipkin2.storage.cassandra.Schema.ensureExists(Schema.java:112) ~[zipkin-storage-cassandra-2.24.3.jar:?]
zipkin-zipkin-1  | 	at zipkin2.storage.cassandra.DefaultSessionFactory.create(DefaultSessionFactory.java:45) ~[zipkin-storage-cassandra-2.24.3.jar:?]
zipkin-zipkin-1  | 	at zipkin2.storage.cassandra.LazySession.get(LazySession.java:39) ~[zipkin-storage-cassandra-2.24.3.jar:?]
zipkin-zipkin-1  | 	at zipkin2.storage.cassandra.CassandraStorage.session(CassandraStorage.java:152) ~[zipkin-storage-cassandra-2.24.3.jar:?]
zipkin-zipkin-1  | 	at zipkin2.storage.cassandra.CassandraSpanStore.<init>(CassandraSpanStore.java:63) ~[zipkin-storage-cassandra-2.24.3.jar:?]
zipkin-zipkin-1  | 	at zipkin2.storage.cassandra.CassandraStorage.spanStore(CassandraStorage.java:164) ~[zipkin-storage-cassandra-2.24.3.jar:?]
zipkin-zipkin-1  | 	at zipkin2.storage.cassandra.CassandraStorage.serviceAndSpanNames(CassandraStorage.java:176) ~[zipkin-storage-cassandra-2.24.3.jar:?]
zipkin-zipkin-1  | 	at zipkin2.server.internal.ZipkinQueryApiV2.getServiceNames(ZipkinQueryApiV2.java:97) ~[classes/:?]
zipkin-zipkin-1  | 	at com.linecorp.armeria.internal.server.annotation.AnnotatedService.invoke(AnnotatedService.java:413) ~[armeria-1.17.2.jar:?]
zipkin-zipkin-1  | 	at com.linecorp.armeria.internal.server.annotation.AnnotatedService.lambda$serve0$8(AnnotatedService.java:382) ~[armeria-1.17.2.jar:?]
zipkin-zipkin-1  | 	at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:646) ~[?:?]
zipkin-zipkin-1  | 	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482) ~[?:?]
zipkin-zipkin-1  | 	at com.linecorp.armeria.common.RequestContext.lambda$makeContextAware$3(RequestContext.java:555) ~[armeria-1.17.2.jar:?]
zipkin-zipkin-1  | 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
zipkin-zipkin-1  | 	at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
zipkin-zipkin-1  | 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
zipkin-zipkin-1  | 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
zipkin-zipkin-1  | 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
zipkin-zipkin-1  | 	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.95.Final.jar:4.1.95.Final]
zipkin-zipkin-1  | 	at java.lang.Thread.run(Thread.java:833) [?:?]
zipkin-zipkin-1  |

I dig into zipkin source code, decreased min supported version to 3.0.8 as ScyllaDB reports for now and build from source to check if it will work. But during opening UI got different errors

com.datastax.oss.driver.api.core.servererrors.InvalidQueryException: Unsupported CUSTOM INDEX class org.apache.cassandra.index.sasi.SASIIndex. Note that currently, Scylla does not support SASI or any other CUSTOM INDEX class.

What can be done here in order to support ScyllaDB as a storage type and get the integration working? I'm ready to help with it.

mkorolyov avatar Sep 10 '23 22:09 mkorolyov

Thanks @mkorolyov I'm interested in this as well and happy to help.

guy9 avatar Sep 11 '23 05:09 guy9

The main issue is "Scylla does not support SASI" which is how a lot of indexing works. So, we could make ScyllaDB work quickly if search is disabled.. then, we can think about how to address non-SASI to make it work fully. Does this seem like an incremental way out?

https://github.com/openzipkin/zipkin/tree/master/zipkin-storage/cassandra#trace-indexing

Another issue I recall from the past was having a test image to make sure what we think works works.. I assume we can make a docker image similar to https://github.com/openzipkin/zipkin/tree/master/docker/test-images/zipkin-cassandra without violating any license with them. If we can't host a docker image it is unlikely we can promise things besides "best efforts"

codefromthecrypt avatar Dec 15 '23 01:12 codefromthecrypt

So as far as I can tell, there is no shared strategy for indexing shared across Scylla, even for c* 5 interop. This means the only way to support Scylla would be to resurrect the old manual indexing which was a chore to maintain. If I get two replies saying someone would help with this maintanance, we can talk about a plan out. Otherwise, I'll update the docs saying why not and close this.

Note: currently no one is helping with cassandra except me and I have never used it, and don't work on tracing on my day job either, so too tall a a task to set on my "couch time"

codefromthecrypt avatar Feb 18 '24 23:02 codefromthecrypt

Thanks @codefromthecrypt let me look into this.

guy9 avatar Feb 20 '24 11:02 guy9