dapr icon indicating copy to clipboard operation
dapr copied to clipboard

Dapr Opentelemetry Semantic Convention and Metric Naming

Open 1046102779 opened this issue 2 years ago • 3 comments

In what area(s)?

/area runtime

Describe the feature

Only keep the trace sdk and metric sdk in the Opentelemetry API, and standardize the observability report.

First we need to unify the semantics of observability.

Opentelmetry Semantic Convention

current name name value description
app_id service.name - custom value
namespace service.namespace - custom value, such as: default, dapr-system, and etc.

Opentelmetry Dapr Extensional Semantic Convention

current name name value description
process_status dapr.component.pubsub.process_status SUCCESS;RETRY;DROP component.pubsub msg return status
success dapr.component.success true;false component handle return status
topic dapr.component.pubsub.topic custom component.pubsub topic
componentName dapr.component.name custom component.name
component dapr.component.type state;binding,configuration;pubsub component type
operation dapr.component.operation custom component operation
- rpc.method - http Method: GET, POST...
dapr.api rpc.api - RPC={RPCName}
- rpc.type Server;Client rpc type
- rpc.status_code Unset;Ok;Error RPC error, grpc and http
reason dapr.reason Custom component init; mtls and etc faile reason
trustDomain dapr.trust_domain - -
policyAction dapr.policy_action true;false request policy(ACL, RBAC)
dapr.invoke_method dapr.invoke_method custom service invocation method
dapr.status_code dapr.status_code custom definition by protocol self dapr rpc response status
dapr.protocol dapr.protocol http;grpc rpc protocol in dapr-ecosystem
grpc_server_method Deprecated - grpc proto API
grpc_server_status Deprecated - grpc status code
grpc_client_method Deprecated - grpc proto API
grpc_client_status Deprecated - -
name dapr.resiliency.name custom value resiliency name
policy dapr.resiliency.policy circuitbreaker, retry, timeout resiliency policy
actor_type dapr.actor.type - actor type

Metrics

HTTP

current name name type description
http_server_request_count dapr_http_server_request_count counter(int64) Number of HTTP requests started in server.
http_server_request_bytes dapr_http_server_request_bytes histogram(int64) HTTP request body size if set as ContentLength (uncompressed) in server.
http_server_response_bytes dapr_http_server_response_bytes histogram(int64) HTTP response body size (uncompressed) in server.
http_server_latency dapr_http_server_latency histogram(float64) HTTP request end to end latency in server.
http_server_response_count dapr_http_server_response_count counter(int64) The number of HTTP responses
http_client_sent_bytes dapr_http_client_sent_bytes histogram(int64) Total bytes sent in request body (not including headers)
http_client_received_bytes dapr_http_client_received_bytes histogram(int64) Total bytes received in response bodies (not including headers but including error responses with bodies)
http_client_roundtrip_latency dapr_http_client_roundtrip_latency histogram(float64) Total bytes received in response bodies (not including headers but including error responses with bodies)
http_client_completed_count dapr_http_client_completed_count counter(int64) Count of completed requests
http_healthprobes_completed_count dapr_http_healthprobes_completed_count counter(int64) Count of completed health probes
http_healthprobes_roundtrip_latency dapr_http_healthprobes_roundtrip_latency histogram(float64) Time between first byte of health probes headers sent to last byte of response received, or terminal error

gRPC

current name name type description
grpc_io_server_received_bytes_per_rpc dapr_grpc_io_server_received_bytes_per_rpc histogram(int64) Total bytes received across all messages per RPC.
grpc_io_server_sent_bytes_per_rpc dapr_grpc_io_server_sent_bytes_per_rpc histogram(int64) Total bytes sent in across all response messages per RPC.
grpc_io_server_server_latency dapr_grpc_io_server_server_latency histogram(float64) Time between first byte of request received to last byte of response sent, or terminal error.
grpc_io_server_completed_rpcs dapr_grpc_io_server_completed_rpcs counter(int64) Distribution of bytes sent per RPC, by method.
grpc_io_client_sent_bytes_per_rpc dapr_grpc_io_client_sent_bytes_per_rpc histogram(int64) Total bytes sent across all request messages per RPC.
grpc_io_client_received_bytes_per_rpc dapr_grpc_io_client_received_bytes_per_rpc histogram(int64) Total bytes received across all response messages per RPC.
grpc_io_client_roundtrip_latency dapr_grpc_io_client_roundtrip_latency histogram(float64) Time between first byte of request sent to last byte of response received, or terminal error.
grpc_io_client_completed_rpcs dapr_grpc_io_client_completed_rpcs counter(int64) Count of RPCs by method and status.

Components

current name name type description
component_pubsub_ingress_count dapr_component_pubsub_ingress_count counter(int64) The number of incoming messages arriving from the pub/sub component.
component_pubsub_ingress_latencies dapr_component_pubsub_ingress_latencies histogram(float64) The consuming app event processing latency.
component_pubsub_egress_count dapr_component_pubsub_egress_count counter(int64) The number of outgoing messages published to the pub/sub component.
component_pubsub_egress_latencies dapr_component_pubsub_egress_latencies histogram(float64) The latency of the response from the pub/sub component.
component_input_binding_count dapr_component_input_binding_count counter(int64) The number of incoming events arriving from the input binding component.
component_input_binding_latencies dapr_component_input_binding_latencies histogram(float64) The triggered app event processing latency.
component_output_binding_count dapr_component_output_binding_count counter(int64) The number of operations invoked on the output binding component.
component_output_binding_latencies dapr_component_output_binding_latencies histogram(float64) The latency of the response from the output binding component.
component_state_count dapr_component_state_count counter(int64) The number of operations performed on the state component.
component_state_latencies dapr_component_state_latencies histogram(float64) The latency of the response from the state component.
component_configuration_count dapr_component_configuration_count counter(int64) The number of operations performed on the configuration component.
component_configuration_latencies dapr_component_configuration_latencies histogram(float64) The latency of the response from the configuration component.
component_secret_count dapr_component_secret_count counter(int64) The number of operations performed on the secret component.
component_secret_latencies dapr_component_secret_latencies histogram(float64) The latency of the response from the secret component.

Runtime

current name name type description
runtime_component_loaded dapr_runtime_component_loaded counter(int64) The number of successfully loaded components.
runtime_component_init_total dapr_runtime_component_init_total counter(int64) The number of initialized components.
runtime_component_init_fail_total dapr_runtime_component_init_fail_total counter(int64) The number of component initialization failures.
runtime_mtls_init_total dapr_runtime_mtls_init_total counter(int64) The number of successful mTLS authenticator initialization.
runtime_mtls_init_fail_total dapr_runtime_mtls_init_fail_total counter(int64) The number of mTLS authenticator init failures.
runtime_mtls_workload_cert_rotated_total dapr_runtime_mtls_workload_cert_rotated_total counter(int64) The number of the successful workload certificate rotations
runtime_mtls_workload_cert_rotated_fail_total dapr_runtime_mtls_workload_cert_rotated_fail_total counter(64) The number of the failed workload certificate rotations.
runtime_actor_status_report_total dapr_runtime_actor_status_report_total counter(int64) The number of the successful status reports to placement service."
runtime_actor_status_report_fail_total dapr_runtime_actor_status_report_fail_total counter(int64) The number of the failed status reports to placement service.
runtime_actor_table_operation_recv_total dapr_runtime_actor_table_operation_recv_total counter(int64) The number of the received actor placement table operations.
runtime_actor_rebalanced_total dapr_runtime_actor_rebalanced_total counter(int64) The number of the actor rebalance requests.
runtime_actor_deactivated_total dapr_runtime_actor_deactivated_total counter(int64) The number of the successful actor deactivation.
runtime_actor_pending_actor_calls dapr_runtime_actor_pending_actor_calls counter(int64) The number of pending actor calls waiting to acquire the per-actor lock.
runtime_acl_app_policy_action_allowed_total dapr_runtime_acl_app_policy_action_allowed_total counter(int64) The number of requests allowed by the app specific action specified in the access control policy.
runtime_acl_global_policy_action_allowed_total dapr_runtime_acl_global_policy_action_allowed_total counter(int64) The number of requests allowed by the global action specified in the access control policy.
runtime_acl_app_policy_action_blocked_total dapr_runtime_acl_app_policy_action_blocked_total counter(64) The number of requests blocked by the app specific action specified in the access control policy.
runtime_acl_global_policy_action_blocked_total dapr_runtime_acl_global_policy_action_blocked_total counter(64) The number of requests blocked by the global action specified in the access control policy.

Resiliency

current name name type description
resiliency_loaded dapr_resiliency_loaded counter(int64) Number of resiliency policies loaded.
resiliency_count dapr_resiliency_count counter(int64) Number of times a resiliency policyKey has been executed.

1046102779 avatar Sep 22 '22 09:09 1046102779

Opentelemetry Init Configuration

apiVersion: dapr.io/v1alpha1
kind: Configuration
metadata:
  name: configuration
  namespace: default
spec:
  metric:
    enabled: true
    exporterAddress: "9.xxx.xxx.x8:4318"
  tracing:
    enabled: true
    samplingRate: 1
    exporterAddress: "9.xxx.xxx.x8:4317"
  log:
    file_enabled: false
    log_level: 0
    trace_enabled: true
    encode_logs_as_json: true
  nameresolution:
    nr_component_name: "nameresolution-polaris"
    service: "dapr.gbot.sqqgroup"
    namespace: "Production"
    metadata:
      token: "77001b56xxxxxeb69ab0431836e9be"

1046102779 avatar Sep 22 '22 09:09 1046102779

Metric Init

package diagnostics

import (
        "context"
        "time"

        "github.com/pkg/errors"
        "go.opentelemetry.io/otel"
        "go.opentelemetry.io/otel/exporters/otlp/otlpmetric"
        "go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp"
        "go.opentelemetry.io/otel/metric"
        "go.opentelemetry.io/otel/metric/global"
        "go.opentelemetry.io/otel/sdk/metric/aggregator/histogram"
        controller "go.opentelemetry.io/otel/sdk/metric/controller/basic"
        processor "go.opentelemetry.io/otel/sdk/metric/processor/basic"
        "go.opentelemetry.io/otel/sdk/metric/selector/simple"
        "go.opentelemetry.io/otel/sdk/resource"
        semconv "go.opentelemetry.io/otel/semconv/v1.9.0"
)
var (
        // DefaultReportingPeriod is the default view reporting period.
        DefaultReportingPeriod = 60 * time.Second

        // DefaultMonitoring holds service monitoring metrics definitions.
        DefaultMonitoring *serviceMetrics
        // DefaultGRPCMonitoring holds default gRPC monitoring handlers and middlewares.
        DefaultGRPCMonitoring *grpcMetrics
        // DefaultHTTPMonitoring holds default HTTP monitoring handlers and middlewares.
        DefaultHTTPMonitoring *httpMetrics
        // DefaultComponentMonitoring holds component specific metrics.
        DefaultComponentMonitoring *componentMetrics
        // DefaultTRPCMonitoring holds default tRPC monitoring handlers and middlewares.
        DefaultTRPCMonitoring *trpcMetrics
)

// MetricClient is a metric client.
type MetricClient struct {
        AppID     string
        Namespace string
        // Address collector receiver address.
        Address string

        meter    metric.Meter
        pusher   *controller.Controller
        exporter *otlpmetric.Exporter
}

// InitMetrics initializes metrics.
func InitMetrics(address, appID, namespace string) (*MetricClient, error) {
        var err error
        if address == "" {
                address = defaultMetricExporterAddr
        }
        client := &MetricClient{
                AppID:     appID,
                Namespace: namespace,
                Address:   address,
        }
        if err = client.init(); err != nil {
                return nil, err
        }
        DefaultMonitoring = client.newServiceMetrics()
        DefaultGRPCMonitoring = client.newGRPCMetrics()
        DefaultHTTPMonitoring = client.newHTTPMetrics()
        DefaultTRPCMonitoring = client.newTRPCMetrics()
        DefaultComponentMonitoring = client.newComponentMetrics()

        return client, nil
}

// ::TODO  https://github.com/open-telemetry/opentelemetry-collector/issues/5238
func (m *MetricClient) init() error {
        var err error
        ctx := context.Background()
        client := otlpmetrichttp.NewClient(
                otlpmetrichttp.WithInsecure(),
                otlpmetrichttp.WithEndpoint(m.Address))
        m.exporter, err = otlpmetric.New(ctx, client)
        if err != nil {
                return errors.Errorf("Failed to create the collector exporter: %v", err)
        }
        res, _ := resource.New(ctx,
                resource.WithAttributes(
                        semconv.ServiceNameKey.String(m.AppID),
                        semconv.ServiceNamespaceKey.String(m.Namespace),
                ),
        )
        // ::TODO https://github.com/open-telemetry/opentelemetry-go/issues/2678
        // fake boundary
        bounds := []float64{5, 10, 50, 100, 150, 200, 250, 300, 350, 400, 500, 600, 700,
800, 900, 1000}
        m.pusher = controller.New(
                processor.NewFactory(
                        simple.NewWithHistogramDistribution(
                                histogram.WithExplicitBoundaries(
                                        bounds)),
                        m.exporter,
                ),
                controller.WithExporter(m.exporter),
                controller.WithCollectPeriod(DefaultReportingPeriod),
                controller.WithCollectTimeout(30*time.Second),
                controller.WithPushTimeout(30*time.Second),
                controller.WithResource(res))
        global.SetMeterProvider(m.pusher)
        // only global one meter, not multiple meters.
        m.meter = global.Meter("mecha",
                metric.WithInstrumentationVersion("v0.27.0"),
                metric.WithSchemaURL("go.opentelemetry.io/otel/metric"))

        if err := m.pusher.Start(ctx); err != nil {
                return errors.Errorf("could not start metric controller: %v", err)
        }
        return nil
}

// Close close metric client.
func (m *MetricClient) Close() error {
        if m == nil {
                return nil
        }
        ctx, cancel := context.WithTimeout(context.Background(), time.Second)
        defer cancel()
        if err := m.pusher.Stop(ctx); err != nil {
                otel.Handle(err)
        }
        if err := m.exporter.Shutdown(ctx); err != nil {
                otel.Handle(err)
        }
        return nil
}

1046102779 avatar Sep 22 '22 09:09 1046102779

Trace Init

package diagnostics

import (
        "context"
        "time"

        "github.com/pkg/errors"

        // We currently don't depend on the Otel SDK since it has not GAed.
        // This package, however, only contains the conventions from the Otel Spec,
        // which we do depend on.
        itrace "github.com/dapr/dapr/pkg/diagnostics/sdk/trace"
        "go.opentelemetry.io/otel"
        "go.opentelemetry.io/otel/attribute"
        "go.opentelemetry.io/otel/exporters/otlp/otlptrace"

        "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
        //"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
        "go.opentelemetry.io/otel/propagation"
        "go.opentelemetry.io/otel/sdk/resource"
        sdktrace "go.opentelemetry.io/otel/sdk/trace"
        semconv "go.opentelemetry.io/otel/semconv/v1.9.0"
        apitrace "go.opentelemetry.io/otel/trace"
)

const (
        // daprInternalSpanAttrPrefix is the internal span attribution prefix.
        // Middleware will not populate it if the span key starts with this prefix.
        daprInternalSpanAttrPrefix = "__dapr."
        // daprAPISpanNameInternal is the internal attribution, but not populated
        // to span attribution.
        daprAPISpanNameInternalKey = attribute.Key(daprInternalSpanAttrPrefix + "spanname
")

        daprRPCServiceInvocationService = "ServiceInvocation"
        daprRPCDaprService              = "Dapr"
        defaultTracingExporterAddr      = "localhost:4318"                                      
        defaultMetricExporterAddr       = "localhost:4318"
)

var defaultTracer apitrace.Tracer = apitrace.
        NewNoopTracerProvider().Tracer("")

// StartInternalCallbackSpan starts trace span for internal callback such as input bindings and pubsub subscription.
func StartInternalCallbackSpan(ctx context.Context, spanName string, parent apitrace.Span
Context) (context.Context, apitrace.Span) {
        ctx = apitrace.ContextWithRemoteSpanContext(ctx, parent)
        return defaultTracer.Start(ctx, spanName,
                apitrace.WithSpanKind(apitrace.SpanKindServer))
}

type TracingClient struct {
        AppID     string
        Namespace string
        // Address collector receiver address.
        Address string
        Token   string

        sampleRatio float64
        exporter    *otlptrace.Exporter
        provider    *sdktrace.TracerProvider
}

// InitTracing init tracing client.
func InitTracing(address, token, appID, namespace string, sampleRatio float64) (*TracingC
lient, error) {
        if address == "" {
                address = defaultTracingExporterAddr
        }
        client := &TracingClient{
                Address:     address,
                Token:       token,
                AppID:       appID,
                Namespace:   namespace,
                sampleRatio: sampleRatio,
        }
        if err := client.init(); err != nil {
                return nil, err
        }
        return client, nil
}

func (t *TracingClient) init() error {
        var err error
        client := otlptracegrpc.NewClient(
                otlptracegrpc.WithEndpoint(t.Address),
                otlptracegrpc.WithInsecure(),
        )
        t.exporter, err = otlptrace.New(context.Background(), client)
        if err != nil {
                return errors.Errorf("creating OTLP trace exporter: %v", err)
        }

        ssp := sdktrace.NewBatchSpanProcessor(t.exporter,
                sdktrace.WithBatchTimeout(5*time.Second))
        t.provider = sdktrace.NewTracerProvider(
                sdktrace.WithSpanProcessor(ssp),
                sdktrace.WithSampler(itrace.TraceIDBasedParentAndRatio(t.sampleRatio)),
                sdktrace.WithResource(
                        resource.NewWithAttributes(
                                semconv.SchemaURL,
                                semconv.ServiceNameKey.String(t.AppID),
                                semconv.ServiceNamespaceKey.String(t.Namespace),
                                attribute.String("token", t.Token),
                        )),
        )
        otel.SetTracerProvider(t.provider)
        otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(propagation.T
raceContext{}, propagation.Baggage{}))

        defaultTracer = otel.GetTracerProvider().Tracer(
                "mecha",
                apitrace.WithInstrumentationVersion("v0.27.0"),
                apitrace.WithSchemaURL("go.opentelemetry.io/otel/tracing"),
        )

        return nil
}

// Close close tracing client.
func (t *TracingClient) Close() error {
        if t == nil {
                return nil
        }

        ctx, cancel := context.WithTimeout(context.Background(), time.Second)
        defer cancel()
        if err := t.provider.Shutdown(ctx); err != nil {
                return err
        }
        if err := t.exporter.Shutdown(ctx); err != nil {
                return err
        }
        return nil
}

1046102779 avatar Sep 22 '22 09:09 1046102779

Looks good overall, cc @yaron2 and @artursouza for more feedback.

daixiang0 avatar Oct 25 '22 02:10 daixiang0

This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

dapr-bot avatar Dec 24 '22 02:12 dapr-bot