zipkin icon indicating copy to clipboard operation
zipkin copied to clipboard

Zipkin not working with Opensearch - appears to be double encoding UTF-8

Open mikebars opened this issue 1 year ago • 4 comments

Describe the Bug

Zipkin fails to start when using Opensearch (but succeeds when using Elasticsearch)

Steps to Reproduce

  1. run docker compose up and wait for containers
  2. run curl --verbose http://0.0.0.0:9200 to see Elasticsearch / Opensearch information
  3. run curl --verbose http://0.0.0.0:9411/health to see Zipkin health

Elasticsearch (working)

docker-compose.yml:

services:
  elasticsearch:
    container_name: elasticsearch
    environment:
      - _JAVA_OPTIONS=-Xms512m -Xmx512m -XX:UseSVE=0
      - action.destructive_requires_name=false
      - discovery.type=single-node
      - http.host=0.0.0.0
      - transport.host=127.0.0.1
      - xpack.monitoring.collection.enabled=false
      - xpack.security.enabled=false
      - xpack.security.http.ssl.enabled=false
    healthcheck:
      interval: 5s
      retries: 10
      start_period: 10s
      test: curl --silent http://localhost:9200/_cluster/health | grep --extended-regexp '"status":"(green|yellow)"'
      timeout: 10s
    image: elastic/elasticsearch:8.17.2
    restart: on-failure
    ports:
      - "9200:9200"
      - "9300:9300"

  zipkin:
    container_name: zipkin
    depends_on:
      elasticsearch:
        condition: service_healthy
    environment:
      - ES_HOSTS=http://elasticsearch:9200
      - JAVA_OPTS=-XX:UseSVE=0
      - STORAGE_TYPE=elasticsearch
    image: openzipkin/zipkin:latest
    ports:
      - "9411:9411"
    restart: on-failure

output of curl --verbose http://0.0.0.0:9200:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 0.0.0.0:9200...
* Connected to 0.0.0.0 (127.0.0.1) port 9200 (#0)
> GET / HTTP/1.1
> Host: 0.0.0.0:9200
> User-Agent: curl/7.88.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< X-elastic-product: Elasticsearch
< content-type: application/json
< content-length: 541
< 
{ [541 bytes data]

100   541  100   541    0     0  46409      0 --:--:-- --:--:-- --:--:-- 49181
* Connection #0 to host 0.0.0.0 left intact
{
  "name" : "662459531c8d",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "uw14QxYCTM2HEg5OZsnWKg",
  "version" : {
    "number" : "8.17.2",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "747663ddda3421467150de0e4301e8d4bc636b0c",
    "build_date" : "2025-02-05T22:10:57.067596412Z",
    "build_snapshot" : false,
    "lucene_version" : "9.12.0",
    "minimum_wire_compatibility_version" : "7.17.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "You Know, for Search"
}

Highlighted part of response

< content-type: application/json

output of curl --verbose http://0.0.0.0:9411/health:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 0.0.0.0:9411...
* Connected to 0.0.0.0 (127.0.0.1) port 9411 (#0)
> GET /health HTTP/1.1
> Host: 0.0.0.0:9411
> User-Agent: curl/7.88.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< content-type: application/json; charset=utf-8
< content-length: 209
< server: Armeria/1.31.3
< date: Mon, 17 Feb 2025 20:21:47 GMT
< 
{ [209 bytes data]

100   209  100   209    0     0  20727      0 --:--:-- --:--:-- --:--:-- 20900
* Connection #0 to host 0.0.0.0 left intact
{
  "status" : "UP",
  "zipkin" : {
    "status" : "UP",
    "details" : {
      "ElasticsearchStorage{initialEndpoints=http://elasticsearch:9200, index=zipkin}" : {
        "status" : "UP"
      }
    }
  }
}

Opensearch (not working)

docker-compose.yml:

services:
  opensearch:
    container_name: opensearch
    environment:
      - _JAVA_OPTIONS=-XX:UseSVE=0
      - action.destructive_requires_name=false
      - DISABLE_INSTALL_DEMO_CONFIG=true
      - DISABLE_SECURITY_PLUGIN=true
      - discovery.type=single-node
      - http.host=0.0.0.0
      - transport.host=127.0.0.1
    healthcheck:
      interval: 5s
      retries: 10
      start_period: 10s
      test: curl --silent http://localhost:9200/_cluster/health | grep --extended-regexp '"status":"(green|yellow)"'
      timeout: 10s
    image: opensearchproject/opensearch:latest
    restart: on-failure
    ports:
      - "9200:9200"
      - "9600:9600"

  zipkin:
    container_name: zipkin
    depends_on:
      opensearch:
        condition: service_healthy
    environment:
      - ES_HOSTS=http://opensearch:9200
      - JAVA_OPTS=-XX:UseSVE=0
      - STORAGE_TYPE=elasticsearch
    image: openzipkin/zipkin:latest
    ports:
      - "9411:9411"
    restart: on-failure

output of curl --verbose http://0.0.0.0:9200:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 0.0.0.0:9200...
* Connected to 0.0.0.0 (127.0.0.1) port 9200 (#0)
> GET / HTTP/1.1
> Host: 0.0.0.0:9200
> User-Agent: curl/7.88.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< content-type: application/json; charset=UTF-8
< content-length: 568
< 
{ [568 bytes data]

100   568  100   568    0     0  50818      0 --:--:-- --:--:-- --:--:-- 51636
* Connection #0 to host 0.0.0.0 left intact
{
  "name" : "bd49f6011512",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "dBYK3GZERkeEqDPB51Pghg",
  "version" : {
    "distribution" : "opensearch",
    "number" : "2.19.0",
    "build_type" : "tar",
    "build_hash" : "fd9a9d90df25bea1af2c6a85039692e815b894f5",
    "build_date" : "2025-02-05T16:13:57.130576800Z",
    "build_snapshot" : false,
    "lucene_version" : "9.12.1",
    "minimum_wire_compatibility_version" : "7.10.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "The OpenSearch Project: https://opensearch.org/"
}

Highlighted part of response

< content-type: application/json; charset=UTF-8

output of curl --verbose http://0.0.0.0:9411/health:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 0.0.0.0:9411...
* Connected to 0.0.0.0 (127.0.0.1) port 9411 (#0)
> GET /health HTTP/1.1
> Host: 0.0.0.0:9411
> User-Agent: curl/7.88.1
> Accept: */*
> 
< HTTP/1.1 503 Service Unavailable
< content-type: application/json; charset=utf-8
< content-length: 1055
< server: Armeria/1.31.3
< date: Mon, 17 Feb 2025 20:08:03 GMT
< 
{ [1055 bytes data]

100  1055  100  1055    0     0   104k      0 --:--:-- --:--:-- --:--:--  114k
* Connection #0 to host 0.0.0.0 left intact
{
  "status" : "DOWN",
  "zipkin" : {
    "status" : "DOWN",
    "details" : {
      "ElasticsearchStorage{initialEndpoints=http://opensearch:9200, index=zipkin}" : {
        "status" : "DOWN",
        "details" : {
          "error" : "IllegalArgumentException: .version.number not found in response: �\u0006\u0000\u0000sNaPpY\u0000�\u0001\u0000�爞�\u0004l{\n  \"name\" : \"bd49f6011512\",\u0001\u001B\u001Ccluster_\u0015#\u0018docker-\r\u00186%\u0000\fuuid\u0005HTdBYK3GZERkeEqDPB51Pghg\t-\u0018version\u0001(<{\n    \"distribut\r\u0017(\"opensearch\u00053\u0001�\u0010umber\u00014\u0018\"2.19.0\u0011\u0019 build_typ\t�\btar6\u001A\u0000\fhash\u00057�fd9a9d90df25bea1af2c6a85039692e815b894f5\"\u0001�\u0004  \rY\fdate\u0005?t2025-02-05T16:13:57.130576800Z6t\u0000\u001Csnapshot\u00019\u0010false\rS\u0018lucene_=\u0001\u0018\"9.12.1\u0011?dminimum_wire_compatibility25\u0000\f7.109\u0002\u00115\u0010indexr6\u0000\u00015\f\n  }\u0001�\u0018\"taglin\t� The OpenS%tH Project: https://o5� .org/\"\n}\n"
        }
      }
    }
  }
}

Expected Behaviour

Zipkin should work with Opensearch the way it does with Elasticsearch

Notes

Since Elasticsearch is returning a response with

< content-type: application/json

and Opensearch is returning a response with

< content-type: application/json; charset=UTF-8

I wonder if the root cause might be that the Opensearch response is being "double encoded" as UTF-8 since I see this logic here that appears to be common to both Elasticsearch and Opensearch:

  • https://github.com/openzipkin/zipkin/blob/0f8fc88d33131dc938532322e19126def8cad8e8/zipkin-storage/elasticsearch/src/main/java/zipkin2/elasticsearch/internal/client/HttpCall.java#L245

mikebars avatar Feb 17 '25 20:02 mikebars

zipkin works with opensearch 2.17 but fails with the above error on opensearch 2.19

mshivanna avatar May 20 '25 15:05 mshivanna

We are having the same issue. Is there any expectation to be solved soon?

alfredo-gil avatar Jul 23 '25 11:07 alfredo-gil

In BaseVersion.convert() method, it compares String with '=='. I'm not sure this occur this issue, but it must be equals() call.

            if (parser.currentToken() == JsonToken.VALUE_STRING) {
              if (parser.currentName() == "distribution") {
                distribution = parser.getText();
              } else if (parser.currentName() == "number") {
                version = parser.getText();
              }
            }

EDIT: I raised #3809 to fix this.

tmurakam avatar Aug 27 '25 09:08 tmurakam

I found the root cause. The OpenSearch 2.19 returns JSON in HTTP compression with content-encoding=snappy, and Zipkin failed to umcompress it. (The OpenSsearch 2.17 does not compress it) I think Zipkin should send an HTTP request with Accept-Encoding header with "identity" or "gzip"

I found a workaround for this. You can set http.compression to false in opensearch.yml to disable HTTP compression. Or you can use HTTPS on OpenSearch, because the compression becomes off when using HTTPS.

http.compression: false

tmurakam avatar Aug 27 '25 11:08 tmurakam