loki grafana-loki lacks basic feature of extracting nested json labels

Is your feature request related to a problem? Please describe. I am running a java spring-boot application on AWS ECS and want to ship logs to loki/grafana. In order to have the java stack trace as a single log line in grafana I log as json to console using

<encoder class="net.logstash.logback.encoder.LogstashEncoder"/>

The resulting json from AWS ECS looks like this:

{
  "container_id": "ea7b47de37024af8a71629fc4c435e09-285964202",
  "container_name": "backend",
  "ecs_cluster": "dev-fargate",
  "ecs_task_arn": "arn:aws:ecs:eu-central-1:xxx:task/dev-fargate/ea7b47de37024af8a71629fc4c435e09",
  "ecs_task_definition": "dev-backend:72",
  "log": {
    "@timestamp": "2022-08-29T08:13:22.893Z",
    "@version": "1",
    "message": "Running with Spring Boot v2.7.3, Spring v5.3.22",
    "logger_name": "com.example.Application",
    "thread_name": "main",
    "level": "DEBUG",
    "level_value": 10000
  }
}

Describe the solution you'd like I want to extract also lables from the log element, i.e. log level, logger name, ... and only keep the log.message as the log text.

So therefore I need configuration like this (cloudformation yaml config), which is currently not working/supported:

...
LogConfiguration:
  LogDriver: awsfirelens
  Options:
    Name: grafana-loki
    Url: https://loki:3000/loki/api/v1/push
    Labels: "{source=\"console\"}"
    LabelKeys: container_id,ecs_task_arn,ecs_task_definition,ecs_cluster,container_name,log.level,log.logger_name,log.thread_name
    RemoveKeys: source,log.level_value,log.@version
    LineFormat: key_value
    insecure_skip_verify: true
  SecretOptions:
    - Name: TenantID
      ValueFrom: !Sub arn:aws:ssm:${AWS::Region}:${AWS::AccountId}:parameter/config/loki/tenant-id
- Name: log_router
  Image: grafana/fluent-bit-plugin-loki:2.6.1-amd64
  Essential: false
  Memory: 512
  Cpu: 256
  FirelensConfiguration:
    Type: fluentbit
    Options:
      enable-ecs-log-metadata: true
  LogConfiguration:
    LogDriver: awslogs
    Options:
      awslogs-stream-prefix: firelens
      awslogs-group: !Ref LogGroup
      awslogs-region: !Ref AWS::Region
...

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered. I have also tried using LabelMapPath, but this didn't work either. Also tried LineFormat=json but has same result.

Additional context This is really a basic feature I bet a lot of people need!

And maybe another option would be great to specify the final message target, i.e. "log.message" in my case... and other properties should be ignored. Would extremly ease the config avoiding the need of specifying tons of "RemoveKeys"...

Aug 29 '22 08:08 sschmiedleitner

Hi @sschmiedleitner I double checked the code of the fluent-bit plugin, and you are right, nested labels cannot be extracted.

https://github.com/grafana/loki/blob/da6fd014448061df7ca3ffe71e469ab2f2d2e77b/clients/cmd/fluent-bit/loki.go#L160-L179

Maybe you want to give it a try to implement a recursive label extraction, e.g. using dot as object separator?

Aug 30 '22 07:08 chaudum

Hi @sschmiedleitner I double checked the code of the fluent-bit plugin, and you are right, nested labels cannot be extracted.

https://github.com/grafana/loki/blob/da6fd014448061df7ca3ffe71e469ab2f2d2e77b/clients/cmd/fluent-bit/loki.go#L160-L179

Maybe you want to give it a try to implement a recursive label extraction, e.g. using dot as object separator?

@chaudum Sorry, but not my language ;-)

Aug 30 '22 07:08 sschmiedleitner

@sschmiedleitner No problem. I marked the issue as good first issue, so people from the community can discover it. Just wanted to note that the fluent-bit plugin is quite niche and therefore does not have the highest priority for us.

Aug 30 '22 08:08 chaudum

@sschmiedleitner you could also use the LabelMapPath (docs) configuration option. You would need to build a custom image that contains the JSON, though.

Aug 30 '22 18:08 chaudum

Hello @chaudum. I decided to try to investigate this issue and find a good solution using the recursive method as you described above, and hopefully I found it. Could you take a look at my example of an implemented solution? If it seems right to you, I would be happy to open PR. If not, I'll try to find a better solution, so feel free to assign this issue to me if possible. :)

Sep 29 '22 17:09 irwinby

@sschmiedleitner you could also use the LabelMapPath (docs) configuration option. You would need to build a custom image that contains the JSON, though.

actually I tried to avoid building an extra image as it is extra effort to find a proper place to put it etc. but anyways I made already a different workaround (rather not extracting that many labels). but still my topic is valid maybe for others.

Sep 30 '22 08:09 sschmiedleitner

Hi @chaudum, I see in the code that it is not resolved yet, can I potentially work on it?

Dec 20 '23 23:12 Woojciech

I think more people run into this extra JSON parsing. We are shipping structured logs to Grafana Cloud using Opentelemetry Collector (applies to loki and otlphttp exporters). Run into similar situation where we have to run JSON parser twice. Once over all log event and next over the log attribute.

Example Query: {Exporter="OTLP"} | json | line_format "{{ .attributes_log }}" | json

Jan 25 '24 03:01 PowerSurj

I would love to see this feature as well, since I'm running into it.

It would also be nice if it could parse a JSON object agnostic of whether it's been stringified or not.

Feb 29 '24 18:02 glitchwizard

You are supposed to parse those nested_logs using for example vector's . |= object!(parse_json!(.my_nested_json)) yourself, if you add tens/hundreds of nested fields into labels - loki index will transform into inverted index.

In case you prefer this kind of logs analytics (index all fields for searches) - Loki is not really a good option for you and you should probably look at different logging platforms i.e. Opensearch or Betterstack.

May 13 '24 19:05 hajdukda

loki loki copied to clipboard

grafana-loki lacks basic feature of extracting nested json labels

loki
loki copied to clipboard