promscale Promscale as the Jaeger certified storage

Add integration tests for promscale in upstream jaeger so that we can claim promscale meets all the expectations.

EDIT

Summary

Storage integration test failures shall be broadly classified into the following categories,

[x] Get Trace with >10K span in a single query - https://github.com/timescale/promscale/issues/1621#issuecomment-1238070108 (fixed by https://github.com/jaegertracing/jaeger/issues/3912)
[x] Empty span name - https://github.com/timescale/promscale/issues/1621#issuecomment-1239182178
[ ] Binary tags are returned as String tags - Jaeger to Otel translator encodes binary value with base64 - https://github.com/timescale/promscale/issues/1621#issuecomment-1239054206
[x] Span x Event table join issue - https://github.com/timescale/promscale/issues/1621#issuecomment-1239073939
[ ] Query tag which is spread across multiple events with in single trace/span - https://github.com/timescale/promscale/issues/1621#issuecomment-1240612369

Steps to run integration test

Run promscale
Apply the patch https://github.com/jaegertracing/jaeger/compare/main...arajkumar:jaeger:promscale-storage-int-tests.diff into jaeger
Run the following from jaeger repo

go clean -testcache && (cd examples/memstore-plugin/ && go build .) && STORAGE=grpc-plugin go test ./plugin/storage/integration -tags grpc_storage_integration -run="TestExternalGRPCStorage" -v

Sep 06 '22 11:09 arajkumar

With little bit of tinkering, I could run the upstream jaeger integration tests.

t.Run("GetServices", s.testGetServices)
t.Run("GetOperations", s.testGetOperations)
t.Run("GetTrace", s.testGetTrace)
t.Run("GetLargeSpans", s.testGetLargeSpan)
t.Run("FindTraces", s.testFindTraces)
t.Run("GetDependencies", s.testGetDependencies)

in the above list, first 3 are running(with little modification), but other tests will need some changes in promscale.

Sep 06 '22 11:09 arajkumar

t.Run("GetLargeSpans", s.testGetLargeSpan)

The above test ingests 10K spans under a single trace id and get them in a single GetTraces operation.

Sep 06 '22 11:09 arajkumar

Here is a test result without GetLargeSpans

tests.log

Sep 06 '22 12:09 arajkumar

Test execution summary:

--- FAIL: TestExternalGRPCStorage (212.15s)
    --- PASS: TestExternalGRPCStorage/GetServices (0.21s)
    --- PASS: TestExternalGRPCStorage/GetOperations (0.00s)
    --- PASS: TestExternalGRPCStorage/GetTrace (0.01s)
        --- PASS: TestExternalGRPCStorage/GetTrace/NotFound_error (0.00s)
    --- FAIL: TestExternalGRPCStorage/FindTraces (211.93s)
        --- FAIL: TestExternalGRPCStorage/FindTraces/Tags_in_one_spot_-_Tags (10.72s)
        --- FAIL: TestExternalGRPCStorage/FindTraces/Tags_in_one_spot_-_Logs (10.70s)
        --- FAIL: TestExternalGRPCStorage/FindTraces/Tags_in_one_spot_-_Process (10.64s)
        --- FAIL: TestExternalGRPCStorage/FindTraces/Tags_in_different_spots (10.64s)
        --- FAIL: TestExternalGRPCStorage/FindTraces/Trace_spans_over_multiple_indices (10.39s)
        --- PASS: TestExternalGRPCStorage/FindTraces/Operation_name (0.01s)
        --- FAIL: TestExternalGRPCStorage/FindTraces/Operation_name_+_max_Duration (10.43s)
        --- FAIL: TestExternalGRPCStorage/FindTraces/Operation_name_+_Duration_range (10.37s)
        --- FAIL: TestExternalGRPCStorage/FindTraces/Duration_range (10.37s)
        --- FAIL: TestExternalGRPCStorage/FindTraces/max_Duration (10.40s)
        --- FAIL: TestExternalGRPCStorage/FindTraces/default (10.40s)
        --- FAIL: TestExternalGRPCStorage/FindTraces/Tags_+_Operation_name (10.69s)
        --- FAIL: TestExternalGRPCStorage/FindTraces/Tags_+_Operation_name_+_max_Duration (10.74s)
        --- FAIL: TestExternalGRPCStorage/FindTraces/Tags_+_Operation_name_+_Duration_range (0.02s)
        --- FAIL: TestExternalGRPCStorage/FindTraces/Tags_+_Duration_range (10.81s)
        --- FAIL: TestExternalGRPCStorage/FindTraces/Tags_+_max_Duration (10.69s)
        --- FAIL: TestExternalGRPCStorage/FindTraces/Multi-spot_Tags_+_Operation_name (10.81s)
        --- FAIL: TestExternalGRPCStorage/FindTraces/Multi-spot_Tags_+_Operation_name_+_max_Duration (10.76s)
        --- FAIL: TestExternalGRPCStorage/FindTraces/Multi-spot_Tags_+_Operation_name_+_Duration_range (10.70s)
        --- FAIL: TestExternalGRPCStorage/FindTraces/Multi-spot_Tags_+_Duration_range (10.70s)
        --- FAIL: TestExternalGRPCStorage/FindTraces/Multi-spot_Tags_+_max_Duration (10.56s)
        --- FAIL: TestExternalGRPCStorage/FindTraces/Multiple_Traces (10.39s)
    --- SKIP: TestExternalGRPCStorage/GetDependencies (0.00s)

Sep 07 '22 08:09 arajkumar

FAIL: TestExternalGRPCStorage/FindTraces/Tags_in_one_spot_-_Tags (10.72s)

The current find trace query does a INNER JOIN on span & event which would cause failure to find a trace which doesn't have any event. We must use LEFT JOIN.

EDIT:

Event after LEFT JOIN fix, this test still fails because of diff in the JSON output.

Actually the span has an attribute with a type as "BINARY" which gets translated into base64 encoded string[1] while storing in DB, when jaeger queries it is returned as a string.

[1] https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/84ec43271330c6e1bfaf5c1483042de04798fd10/pkg/translator/jaeger/jaegerproto_to_traces.go#L263

Sep 07 '22 08:09 arajkumar

When I use LEFT JOIN for event and span table merge, I see the following tests are passing additionally but above listed problem remains for other tests,

TestExternalGRPCStorage/FindTraces/Tags_in_different_spots
TestExternalGRPCStorage/FindTraces/Operation_name_+_Duration_range
TestExternalGRPCStorage/FindTraces/Duration_range
TestExternalGRPCStorage/FindTraces/max_Duration

Sep 07 '22 08:09 arajkumar

Few more are passing after adding a second delay in grpc_test Refresh function. Looks like a bug in somewhere in the grpc plugin implementation(Note tracing.asyc-acks are not enabled).

--- PASS: TestExternalGRPCStorage/FindTraces/Tags_in_different_spots (0.00s) --- PASS: TestExternalGRPCStorage/FindTraces/Trace_spans_over_multiple_indices (0.00s) --- PASS: TestExternalGRPCStorage/FindTraces/Operation_name (0.00s) --- PASS: TestExternalGRPCStorage/FindTraces/Operation_name_+max_Duration (0.00s) --- PASS: TestExternalGRPCStorage/FindTraces/Operation_name+_Duration_range (0.11s) --- PASS: TestExternalGRPCStorage/FindTraces/Duration_range (0.11s) --- PASS: TestExternalGRPCStorage/FindTraces/max_Duration (0.11s)

EDIT:

It uses Streaming Span writer which is async.

Sep 07 '22 09:09 arajkumar

    --- FAIL: TestExternalGRPCStorage/FindTraces/default (10.40s)

This one fails when operationName field of Span is empty.

EDIT:

Following error is thrown when OperationName is empty,

ERROR: new row for relation "operation" violates check constraint "operation_span_name_check" (SQLSTATE 23514)

EDIT:

operations table has a constraint on the span_name column which prevents empty insertions.

EDIT:

I could think of 2 ways to fix the problem.

OTLP doesn't really mandate non empty value for span name, it just semantically required and empty value is equivalent to an unknown span name. Hence we can drop the constrains in the db side.
In Promscale, set default value if span name is empty and remove the same from response.

Sep 07 '22 10:09 arajkumar

--- FAIL: TestExternalGRPCStorage/FindTraces/Multi-spot_Tags_+Operation_name (10.81s) --- FAIL: TestExternalGRPCStorage/FindTraces/Multi-spot_Tags+Operation_name+max_Duration (10.76s) --- FAIL: TestExternalGRPCStorage/FindTraces/Multi-spot_Tags+Operation_name+Duration_range (10.70s) --- FAIL: TestExternalGRPCStorage/FindTraces/Multi-spot_Tags+Duration_range (10.70s) --- FAIL: TestExternalGRPCStorage/FindTraces/Multi-spot_Tags+_max_Duration (10.56s)

trace

query

This has a tag query across span, resource and event. It fails when you have 2 events with 2 distinct tags and both of them are queried at a time.

e.g.

trace_id	span_id	event_name	tag
0001	0007	event1	foo=foo
0001	0007	event2	bar=bar

When you query foo=foo bar=bar, the following WHERE clause would be generated by the promscale which would return empty result because both of the condition won't meet.

e.tags @> ANY($3::jsonb[]) AND e.tags @> ANY($5::jsonb[])

It also depends on fix from https://github.com/timescale/promscale/issues/1621#issuecomment-1239054206

Sep 08 '22 11:09 arajkumar

The problem with GetLargeSpans is with the array_agg.

The same span is sent 10008 times in the test:

What this means is that we end up with 10008 entries in _ps_trace.span that are identical:

promscale=# select * from _ps_trace.span;
─[ RECORD 1 ]───────────────┬─────────────────────────────────────
trace_id                    │ 00000000-0000-0000-0000-000000000011
span_id                     │ 3
parent_span_id              │ ∅
operation_id                │ 1
start_time                  │ 2022-09-12 19:03:11.639875+02
end_time                    │ 2022-09-12 19:03:11.639975+02
duration_ms                 │ 0.1
instrumentation_lib_id      │ ∅
resource_schema_url_id      │ ∅
event_time                  │ empty
dropped_tags_count          │ 0
dropped_events_count        │ 0
dropped_link_count          │ 0
resource_dropped_tags_count │ 0
status_code                 │ unset
trace_state                 │ ∅
span_tags                   │ {}
status_message              │
resource_tags               │ {"1": 1}
═[ RECORD 2 ]═══════════════╪═════════════════════════════════════
trace_id                    │ 00000000-0000-0000-0000-000000000011
span_id                     │ 3
parent_span_id              │ ∅
operation_id                │ 1
start_time                  │ 2022-09-12 19:03:22.639875+02
end_time                    │ 2022-09-12 19:03:22.639975+02
duration_ms                 │ 0.1
instrumentation_lib_id      │ ∅
resource_schema_url_id      │ ∅
event_time                  │ empty
dropped_tags_count          │ 0
dropped_events_count        │ 0
dropped_link_count          │ 0
resource_dropped_tags_count │ 0
status_code                 │ unset
trace_state                 │ ∅
span_tags                   │ {}
status_message              │
resource_tags               │ {"1": 1}

If you look at the span fixture it has 2 empty (except for timestamp) entries in logs, the translations between jaeger and otel stores this log entries as events. Looking at the _ps_trace.event table we got the same 2 events duplicated 10008 times each (same number of spans that were sent).

promscale=# select count(*) from _ps_trace.event;
 count
═══════
 20016

promscale=# select * from _ps_trace.event limit 4;
             time              │               trace_id               │ span_id │ event_nbr │ dropped_tags_count │ name │ tags
═══════════════════════════════╪══════════════════════════════════════╪═════════╪═══════════╪════════════════════╪══════╪══════
 2022-09-12 18:46:31.639875+02 │ 00000000-0000-0000-0000-000000000011 │       3 │         0 │                  0 │      │ {}
 2022-09-12 18:46:31.639875+02 │ 00000000-0000-0000-0000-000000000011 │       3 │         1 │                  0 │      │ {}
 2022-09-12 18:46:31.639875+02 │ 00000000-0000-0000-0000-000000000011 │       3 │         0 │                  0 │      │ {}
 2022-09-12 18:46:31.639875+02 │ 00000000-0000-0000-0000-000000000011 │       3 │         1 │                  0 │      │ {}

When we try to get the trace we use array_agg to get the events since all the spans have the same values, the query aggregates the 20016 events to every row.

─[ RECORD 1 ]────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
trace_id                 │ 00000000-0000-0000-0000-000000000011
span_id                  │ 3
parent_span_id           │ ∅
start_times              │ 2022-09-12 18:46:31.639875+02
end_times                │ 2022-09-12 18:46:31.639975+02
span_kind                │ unspecified
dropped_tags_counts      │ 0
dropped_events_counts    │ 0
dropped_link_counts      │ 0
status_code_string       │ unset
status_message           │
trace_states             │ ∅
schema_urls              │ ∅
span_names               │ example-operation-1
resource_tags            │ {"service.name": "example-service-1"}
span_tags                │ ∅
event_names              │ {"","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","",""
event_times              │ {"2022-09-12 18:46:31.639875+02","2022-09-12 18:46:31.639875+02","2022-09-12 18:46:31.639875+02","2022-09-12 18:46:31.639875+02","2022-09-12 18:46:31.639875+02","2022-09-12 18:46:31.639875+02","2022-09-12 18:46:31
event_dropped_tags_count │ {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
event_tags               │ {NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NU
library_name             │ ∅
library_version          │ ∅
library_schema_url       │ ∅
links_linked_trace_ids   │ ∅
links_linked_span_ids    │ ∅
links_trace_states       │ ∅
links_dropped_tags_count │ ∅
links_tags               │ ∅

Sep 13 '22 20:09 alejandrodnm

Related to GetLargeSpans

If I give each Span in the test a unique ID the test passes. Maybe we can add this change to the test. We can use the following arguments to state that the current test is not very representative of a real scenario:

The opentelemtry spec defines the SpanID

SpanId is the identifier for a span. It is globally unique with practically sufficient probability by being made as 8 randomly generated bytes. When passed to a child Span this identifier becomes the parent span id for the child Span

The Jaeger Thrift model states:

  3:  required i64           spanId       # unique span id (only unique within a given trace)

Also, Jaeger has a depuper for non unique spanIDs, because as the documentation says:

Jaeger UI expects all spans to have unique IDs.

I guess the other option is handling duplicates ourselves.

Sep 14 '22 10:09 alejandrodnm

GetLargeSpans fixed by https://github.com/jaegertracing/jaeger/issues/3912

Sep 15 '22 19:09 alejandrodnm

The problems with TestExternalGRPCStorage/FindTraces/Tags_+_Operation_name are:

Spans in fixture have different trace_id:

This test sends a span (a) from a trace and has 3 references to 2 other spans (b and c):

a CHILD_OF b
a CHILD_OF of c
a FOLLOWS_FROM c

https://github.com/jaegertracing/jaeger/blob/cac21f82950545b613f45d3984d96e1498bfd15d/plugin/storage/integration/fixtures/traces/tags_opname_trace.json#L4-L23

The problem is with the child relationship and the fact that each span has a different traceId. This breaks the parent relationship because when the translator tries to get the parent span it cannot find a match given that all of the spans have different traces, the ParentSpanID method makes the following assertion ref.TraceID == s.TraceID && ref.RefType == ChildOf :

https://github.com/jaegertracing/jaeger/blob/cac21f82950545b613f45d3984d96e1498bfd15d/model/span.go#L99-L109

I think for this point we can propose a fix to the test. Like we did for the GetLargeSpan propose a more realistic scenario.

OTEL doesn't have a concept of refType

The OTEL model doesn't define a span reference type (in the case of OTEL it would be link). When translating from otel to jaeger the parent span is assign the CHILD_OF refType, and all other links are assigned FOLLOWS_FROM:

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/7b5eb5280c15a5cc195d68757ba4c24c2c7f8eba/pkg/translator/jaeger/traces_to_jaegerproto.go#L260-L281

They explicitly say:

Since Jaeger RefType is not captured in internal data, use SpanRefType_FOLLOWS_FROM by default. SpanRefType_CHILD_OF supposed to be set only from parentSpanID.

This is a problem for the test because it's only setting one CHILD_OF when the test expects 2 from the result.

The expected based on the fixture is:

                "references": [
                    {
                        "trace_id": "AAAAAAAAAAAAAAAAAAAAEg==",
                        "span_id": "AAAAAAAAAP8="
                    },
                    {
                        "trace_id": "AAAAAAAAAAAAAAAAAAAAEg==",
                        "span_id": "AAAAAAAAAAI="
                    },
                    {
                        "trace_id": "AAAAAAAAAAAAAAAAAAAAAQ==",
                        "span_id": "AAAAAAAAAAI=",
                        "ref_type": 1
                    }
                ],

The first 2 are ref_type=0 which is CHILD_OF.

The actual result we are getting because the translator sets only one CHILD_OF is:

                "references": [
                    {
                        "trace_id": "AAAAAAAAAAAAAAAAAAAAEg==",
                        "span_id": "AAAAAAAAAP8="
                    },
                    {
                        "trace_id": "AAAAAAAAAAAAAAAAAAAAEg==",
                        "span_id": "AAAAAAAAAAI=",
                        "ref_type": 1
                    },
                    {
                        "trace_id": "AAAAAAAAAAAAAAAAAAAAAQ==",
                        "span_id": "AAAAAAAAAAI=",
                        "ref_type": 1
                    }
                ],

As you can see, since the translator takes the first CHILD_OF for the span parent id all the other relationships are FOLLOWS_FROM.

Again, we can ask for a more realistic test which sets only one parent for the span.

Aggregate for links are not returning data

The third problem is that the the query doesn't seem to be returning all the links properly. The aggregate function filter clause doesn't seem to be working correctly.

We are doing FILTER(WHERE lk IS NOT NULL) but that seems to be filtering out everything. Maybe the WHERE clause requires and explicit column

SELECT
    s.trace_id,
    s.span_id,
    array_agg(lk.linked_trace_id ORDER BY lk.link_nbr) FILTER(WHERE lk.trace_id IS NOT NULL)  links_lkTraceId_not_null,
    array_agg(lk.linked_trace_id ORDER BY lk.link_nbr) FILTER(WHERE lk IS NOT NULL) links_lk_not_null
  FROM
    _ps_trace.span s
  LEFT JOIN
    _ps_trace.link lk ON lk.trace_id = s.trace_id AND lk.span_id = s.span_id
  GROUP BY
    s.trace_id,
    s.span_id;

               trace_id               │ span_id │        links_lktraceid_not_null        │ links_lk_not_null
══════════════════════════════════════╪═════════╪════════════════════════════════════════╪═══════════════════
 00000000-0000-0000-0000-000000000001 │       4 │ {00000000-0000-0000-0000-000000000001} │ ∅
(1 row)

Time: 101.343 ms```

Sep 20 '22 08:09 alejandrodnm

GetLargeSpans fixed by jaegertracing/jaeger#3912

GetLargeSpans test still fails when it is run with all other tests specifically GetTrace. Here is a test summary,

/plugin/storage/integration/integration_test.go:133
                Error:          Not equal:
                                expected: 10008
                                actual  : 10012
                Test:           TestExternalGRPCStorage/GetLargeSpans

The actual problem is that both GetLargeSpans and GetTrace ingests test trace with same trace id and span id and end up getting more the expected number of results.

Sep 30 '22 09:09 arajkumar

Hmm, the expectation is to run tests in isolation. See https://github.com/jaegertracing/jaeger/blob/4fc291568d8ac59a1c67cc47ee1d91ab20dd06c4/plugin/storage/integration/integration_test.go#L119

Sep 30 '22 09:09 arajkumar

Hmm, the expectation is to run tests in isolation. See https://github.com/jaegertracing/jaeger/blob/4fc291568d8ac59a1c67cc47ee1d91ab20dd06c4/plugin/storage/integration/integration_test.go#L119

Fixed in https://github.com/timescale/promscale/pull/1627/commits/f02d9a8c3ebc3efc3619d89931edee69bba74824

Sep 30 '22 12:09 arajkumar

All tests are passing now.

Oct 17 '22 09:10 arajkumar