loki icon indicating copy to clipboard operation
loki copied to clipboard

Support receiving logs in Loki using OpenTelemetry OTLP

Open js8080 opened this issue 3 years ago • 20 comments

Is your feature request related to a problem? Please describe. I am running Grafana Loki inside a Kubernetes cluster but I have some applications running outside the cluster and I want to get logging data from those applications into Loki without relying on custom APIs or file-based logging.

Describe the solution you'd like OpenTelemetry describes a number of approaches including using the OpenTelemetry Collector. The OpenTelemetry Collector supports various types of exporters and the OTLP exporter supports logs, metrics, and traces. Tempo supports receiving trace data via OTLP and it would be great if Loki also had support for receiving log data via OTLP. This way, people could run the OpenTelemetry Collector next to their applications and send logs into Loki in a standard way using the OpenTelemetry New First-Party Application Logs recommendations.

Currently, unless I am misunderstanding the Loki documentation, it seems the only API into Loki is custom:

Details on the OTLP specification:

  • OTLP/gRPC: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/otlp.md#otlpgrpc
  • OTLP/HTTP: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/otlp.md#otlphttp

Describe alternatives you've considered There are a number of Loki Clients that one can use to get logs into Loki but they all seem to involve using the custom Loki push API or reading from log files. Supporting the OpenTelemetry Collector would allow following the OpenTelemetry New First-Party Application Logs recommendations

Additional context Add any other context or screenshots about the feature request here.

js8080 avatar Feb 08 '22 15:02 js8080

done .https://github.com/grafana/loki/pull/5363

1: grafana otlp log view image 2 go client mod.go dependency

	go.opentelemetry.io/collector/model v0.44.0

demo go client code:

import (
	"context"
	"testing"
	"time"

	"github.com/stretchr/testify/require"
	"go.opentelemetry.io/collector/model/otlpgrpc"
	"go.opentelemetry.io/collector/model/pdata"
	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials/insecure"
)

func TestGrpcClient(t *testing.T) {
	grpcEndpoint := "localhost:4317"
	//client
	addr := grpcEndpoint
	conn, err := grpc.Dial(addr, grpc.WithTransportCredentials(insecure.NewCredentials()))
	require.NoError(t, err)

	client := otlpgrpc.NewLogsClient(conn)
	request := markRequest()
	_, err = client.Export(context.Background(), request)
	require.NoError(t, err)
}

func markRequest() otlpgrpc.LogsRequest {
	request := otlpgrpc.NewLogsRequest()
	pLog := pdata.NewLogs()

	pmm := pLog.ResourceLogs().AppendEmpty()
	pmm.Resource().Attributes().InsertString("app", "testApp")

	ilm := pmm.InstrumentationLibraryLogs().AppendEmpty()
	ilm.InstrumentationLibrary().SetName("testName")

	now := time.Now()

	logReocrd := ilm.LogRecords().AppendEmpty()
	logReocrd.SetName("testName")
	logReocrd.SetFlags(31)
	logReocrd.SetSeverityNumber(1)
	logReocrd.SetSeverityText("WARN")
	logReocrd.SetSpanID(pdata.NewSpanID([8]byte{1, 2}))
	logReocrd.SetTraceID(pdata.NewTraceID([16]byte{1, 2, 3, 4}))
	logReocrd.Attributes().InsertString("level", "WARN")
	logReocrd.SetTimestamp(pdata.NewTimestampFromTime(now))

	logReocrd2 := ilm.LogRecords().AppendEmpty()
	logReocrd2.SetName("testName")
	logReocrd2.SetFlags(31)
	logReocrd2.SetSeverityNumber(1)
	logReocrd2.SetSeverityText("INFO")
	logReocrd2.SetSpanID(pdata.NewSpanID([8]byte{3, 4}))
	logReocrd2.SetTraceID(pdata.NewTraceID([16]byte{1, 2, 3, 4}))
	logReocrd2.Attributes().InsertString("level", "WARN")
	logReocrd2.SetTimestamp(pdata.NewTimestampFromTime(now))
	request.SetLogs(pLog)
	return request
}

liguozhong avatar Feb 10 '22 16:02 liguozhong

Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

  • Mark issues as revivable if we think it's a valid issue but isn't something we are likely to prioritize in the future (the issue will still remain closed).
  • Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.

stale[bot] avatar Apr 17 '22 07:04 stale[bot]

May I ask, whats the current state here? :)

frzifus avatar Aug 04 '23 14:08 frzifus

@frzifus The status of this is that we still miss the API, but the key storage issue is addressed by non-indexed labels (See upcoming docs PR: https://github.com/grafana/loki/pull/10073). As @slim-bean mentioned in his earlier comment, we need an efficient storage for OLTP labels: and AFAIU and as mentioned in the last NASA community call we are close to non-indexed labels:

periklis avatar Aug 07 '23 09:08 periklis

Any update on the native OTLP support ?

madhub avatar Nov 23 '23 07:11 madhub

@sandeepsukhani might know a thing or two about this :-)

jpkrohling avatar Nov 23 '23 09:11 jpkrohling

Hey folks, we have added experimental OTLP log ingestion support to Loki. It has yet to be released, so you would have to use the latest main to try it. You can read more about it in the docs. Please give it a try in your dev environments and share any feedback or suggestions.

sandeepsukhani avatar Nov 24 '23 09:11 sandeepsukhani

Hi, really looking forward to that feature :)

I saw that a service.instance.id will be considered a label, doesn't this have the potential to be a high cardinality value?

Also will it be possible to customize the "labels" list? In our case we run nomad so the k8s.... resource attributes wouldn't really work for us. But we would have resource attributes like nomad.job.name which would make sense for us as labels

mxab avatar Dec 04 '23 14:12 mxab

@sandeepsukhani looks good, I'll give it a try next week.

One immediate suggestion is that I'd like to be able to configure the indexed labels so I can add/remove items from the list. Perhaps it should default to the list you have in the docs and then the user can provide their own list to override it.

Also I see the span_id and trace_id are currently metadata, shouldn't the trace_id at least be indexed so I can correlate logs to traces?

Another suggestion is that the conversion adds a 'severity_number' metadata attribute which is not very useful, instead it should map it to a 'level' field like the opentelemetry collector translator does: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/translator/loki/logs_to_loki.go.

bouk avatar Dec 16 '23 12:12 bouk

Hi, Can I ask if there are plans to support grpc? Or maybe I missed some documentation and it's actually supported now?

DengYiPeng avatar Apr 24 '24 06:04 DengYiPeng

Is this supported in Loki v3? I get 404 error when calling the endpoint

gkaskonas avatar Jun 05 '24 08:06 gkaskonas

Does anyone know if this is now possible?

I have an OTEL collector running on a k8s cluster which I would like to gather logs from and send over to a remote loki stack running on another k8s cluster. I'm hoping to achieve this via OTLP HTTP, and there is documentation that seems to indicate that this is possible. However, after following the documentation I haven't had any success. Sending logs from the OTEL collector to a remote Loki instance should be possible through OTLP HTTP, right?

alextricity25 avatar Jun 11 '24 15:06 alextricity25

Yes, Loki v3 includes an OTLP port to ingest OTLP Logs natively.

jpkrohling avatar Jun 20 '24 13:06 jpkrohling

Does anyone know if this is now possible?

I have an OTEL collector running on a k8s cluster which I would like to gather logs from and send over to a remote loki stack running on another k8s cluster. I'm hoping to achieve this via OTLP HTTP, and there is documentation that seems to indicate that this is possible. However, after following the documentation I haven't had any success. Sending logs from the OTEL collector to a remote Loki instance should be possible through OTLP HTTP, right?

A quick update on this - I was able to receive logs via OTLP HTTP successfully. Turned out to be a mistake with my config.

alextricity25 avatar Jun 20 '24 13:06 alextricity25

Solved

Well nevermind. I eventually found out I was just using a wrong version of loki (grafana/loki instead of grafana/loki:3.0.0) and the OTLP endpoint wasn't ready yet. So if you have the issue described below, just upgrade 🤷


Original Problem

A quick update on this - I was able to receive logs via OTLP HTTP successfully. Turned out to be a mistake with my config.

@alextricity25 Would you mind sharing how you got this to work? I am currently stuck at a stage where the collector gives me this error:

2024-06-23T17:30:51.116Z	error	exporterhelper/queue_sender.go:90	Exporting failed. Dropping data.	{"kind": "exporter", "data_type": "logs", "name": "otlphttp", "error": "not retryable error: Permanent error: rpc error: code = Unimplemented desc = error exporting items, request to http://loki.telemetry.svc.cluster.local:3100/otlp/v1/logs responded with HTTP Status Code 404", "dropped_items": 10}
go.opentelemetry.io/collector/exporter/exporterhelper.newQueueSender.func1
	go.opentelemetry.io/collector/[email protected]/exporterhelper/queue_sender.go:90
go.opentelemetry.io/collector/exporter/internal/queue.(*boundedMemoryQueue[...]).Consume
	go.opentelemetry.io/collector/[email protected]/internal/queue/bounded_memory_queue.go:52
go.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1
	go.opentelemetry.io/collector/[email protected]/internal/queue/consumers.go:43

For reference, this is the config for the collector I am currently deploying using the operator:

Click to expand
# OpenTelemetry Operator
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: otel-collector
  namespace: telemetry
spec:
  image: otel/opentelemetry-collector-contrib:0.103.0
  serviceAccount: otel-collector
  mode: daemonset
  volumeMounts:
    # Mount the volumes to the collector container
    - name: varlogpods
      mountPath: /var/log/pods
      readOnly: true
    - name: varlibdockercontainers
      mountPath: /var/lib/docker/containers
      readOnly: true
  volumes:
    # Typically the collector will want access to pod logs and container logs
    - name: varlogpods
      hostPath:
        path: /var/log/pods
    - name: varlibdockercontainers
      hostPath:
        path: /var/lib/docker/containers
  config:
    receivers:
      otlp:
        protocols:
          grpc: {}
          http: {}
      filelog:
        include_file_path: true
        include:
          - /var/log/pods/*/*/*.log
        exclude:
          - /var/log/pods/telemetry_otel-collector*/*/*.log
        operators:
          - id: container-parser
            type: container
    processors:
      batch: {}
    exporters:
      logging:
        loglevel: debug
      otlphttp:
        endpoint: http://loki.telemetry.svc.cluster.local:3100/otlp
        compression: none
        tls:
          insecure: true
      prometheus:
        endpoint: "0.0.0.0:8889"
    service:
      pipelines:
        metrics:
          receivers: [otlp]
          processors: [batch]
          exporters: [prometheus]
        logs:
          receivers: [otlp,filelog]
          processors: [batch]
          exporters: [logging, otlphttp]

And, the loki config:

Click to expand
auth_enabled: false

    server:
      http_listen_port: 3100
      grpc_listen_port: 9095

    common:
      path_prefix: /loki
      storage:
        filesystem:
          chunks_directory: /loki/chunks
          rules_directory: /loki/rules
      replication_factor: 1
      ring:
        kvstore:
          store: inmemory

    query_range:
      results_cache:
        cache:
          embedded_cache:
            enabled: true
            max_size_mb: 100

    schema_config:
      configs:
        - from: 2024-04-01
          object_store: s3
          store: tsdb
          schema: v13
          index:
            prefix: index_
            period: 24h

    storage_config:
      tsdb_shipper:
        active_index_directory: /loki/tsdb-index
        cache_location: /loki/tsdb-cache
      aws:
        s3: s3://minioadmin:[email protected]:9000/loki-data
        s3forcepathstyle: true

    limits_config:
      retention_period: 744h
      enforce_metric_name: false
      reject_old_samples: true
      reject_old_samples_max_age: 168h
      max_global_streams_per_user: 5000
      ingestion_rate_mb: 10
      ingestion_burst_size_mb: 20
      allow_structured_metadata: true

    # chunk_store_config:
    #   max_look_back_period: 744h

    table_manager:
      retention_deletes_enabled: true
      retention_period: 744h

    ruler:
      storage:
        type: local
        local:
          directory: /loki/rules
      rule_path: /loki/rules-temp
      # alertmanager_url: http://alertmanager:9093 TODO deploy alertmanager
      ring:
        kvstore:
          store: inmemory
      enable_api: true

    query_scheduler:
      max_outstanding_requests_per_tenant: 2048

    frontend:
      max_outstanding_per_tenant: 2048
      compress_responses: true

This looks suspiciously like the OTLP collector is still using gRPC, but this is exactly what the docs tell me to do so I am clueless here. Any help would be appreciated

leonhma avatar Jun 23 '24 18:06 leonhma

The documentation lists http://<loki-addr>:3100/otlp as endpoint for the oltphttp exporter, but the actual endpoint is http://<loki-addr>:3100/otlp/v1/logs; this explains the 404.

leahneukirchen avatar Jun 24 '24 15:06 leahneukirchen

Well nevermind. I eventually found out I was just using a wrong version of loki (grafana/loki instead of grafana/loki:3.0.0) and the OTLP endpoint wasn't ready yet. So if you have the issue described below, just upgrade 🤷

The official docker-compose.yml still contains the old version

https://github.com/grafana/loki/blob/0a7e9133590ffb361b9c4eb6c4b8a5b772d83676/production/docker-compose.yaml#L6-L8

The documentation lists http://:3100/otlp as endpoint for the oltphttp exporter, but the actual endpoint is http://:3100/otlp/v1/logs; this explains the 404.

Indeed, I found the /otlp reference in https://grafana.com/docs/loki/latest/send-data/. From https://grafana.com/docs/loki/latest/reference/loki-http-api/#ingest-logs-using-otlp,

When configuring the OpenTelemetry Collector, you must use endpoint: http://:3100/otlp, as the collector automatically completes the endpoint. Entering the full endpoint will generate an error.

So there's some inconsistency somewhere in the docs or the examples.

astrojuanlu avatar Jun 25 '24 13:06 astrojuanlu

Using Alloy, endpoint = "http://localhost:3100/otlp" works, but if you want to log directly, e.g., you need OTEL_EXPORTER_OTLP_LOGS_ENDPOINT=http://localhost:3100/otlp/v1/logs

leahneukirchen avatar Jun 25 '24 13:06 leahneukirchen

The documentation lists http://<loki-addr>:3100/otlp as endpoint for the oltphttp exporter, but the actual endpoint is http://<loki-addr>:3100/otlp/v1/logs; this explains the 404.

I met the same problem. Have you solved yet ?

LookOuta avatar Jul 31 '24 10:07 LookOuta

The documentation lists http://<loki-addr>:3100/otlp as endpoint for the oltphttp exporter, but the actual endpoint is http://<loki-addr>:3100/otlp/v1/logs; this explains the 404.

I met the same problem. Have you solved yet ?

I solved this problem by upgrading the Loki docker container to version 3.1.1.

Zagrebelin avatar Aug 20 '24 12:08 Zagrebelin

Closing this issue with the introduction of the native OTLP endpoint. Please reopen if required :)

Jayclifford345 avatar Sep 03 '24 13:09 Jayclifford345