loki
loki copied to clipboard
grafana-loki lacks basic feature of extracting nested json labels
Is your feature request related to a problem? Please describe. I am running a java spring-boot application on AWS ECS and want to ship logs to loki/grafana. In order to have the java stack trace as a single log line in grafana I log as json to console using
<encoder class="net.logstash.logback.encoder.LogstashEncoder"/>
The resulting json from AWS ECS looks like this:
{
"container_id": "ea7b47de37024af8a71629fc4c435e09-285964202",
"container_name": "backend",
"ecs_cluster": "dev-fargate",
"ecs_task_arn": "arn:aws:ecs:eu-central-1:xxx:task/dev-fargate/ea7b47de37024af8a71629fc4c435e09",
"ecs_task_definition": "dev-backend:72",
"log": {
"@timestamp": "2022-08-29T08:13:22.893Z",
"@version": "1",
"message": "Running with Spring Boot v2.7.3, Spring v5.3.22",
"logger_name": "com.example.Application",
"thread_name": "main",
"level": "DEBUG",
"level_value": 10000
}
}
Describe the solution you'd like I want to extract also lables from the log element, i.e. log level, logger name, ... and only keep the log.message as the log text.
So therefore I need configuration like this (cloudformation yaml config), which is currently not working/supported:
...
LogConfiguration:
LogDriver: awsfirelens
Options:
Name: grafana-loki
Url: https://loki:3000/loki/api/v1/push
Labels: "{source=\"console\"}"
LabelKeys: container_id,ecs_task_arn,ecs_task_definition,ecs_cluster,container_name,log.level,log.logger_name,log.thread_name
RemoveKeys: source,log.level_value,log.@version
LineFormat: key_value
insecure_skip_verify: true
SecretOptions:
- Name: TenantID
ValueFrom: !Sub arn:aws:ssm:${AWS::Region}:${AWS::AccountId}:parameter/config/loki/tenant-id
- Name: log_router
Image: grafana/fluent-bit-plugin-loki:2.6.1-amd64
Essential: false
Memory: 512
Cpu: 256
FirelensConfiguration:
Type: fluentbit
Options:
enable-ecs-log-metadata: true
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-stream-prefix: firelens
awslogs-group: !Ref LogGroup
awslogs-region: !Ref AWS::Region
...
Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered. I have also tried using LabelMapPath, but this didn't work either. Also tried LineFormat=json but has same result.
Additional context This is really a basic feature I bet a lot of people need!
And maybe another option would be great to specify the final message target, i.e. "log.message" in my case... and other properties should be ignored. Would extremly ease the config avoiding the need of specifying tons of "RemoveKeys"...
Hi @sschmiedleitner I double checked the code of the fluent-bit plugin, and you are right, nested labels cannot be extracted.
https://github.com/grafana/loki/blob/da6fd014448061df7ca3ffe71e469ab2f2d2e77b/clients/cmd/fluent-bit/loki.go#L160-L179
Maybe you want to give it a try to implement a recursive label extraction, e.g. using dot as object separator?
Hi @sschmiedleitner I double checked the code of the fluent-bit plugin, and you are right, nested labels cannot be extracted.
https://github.com/grafana/loki/blob/da6fd014448061df7ca3ffe71e469ab2f2d2e77b/clients/cmd/fluent-bit/loki.go#L160-L179
Maybe you want to give it a try to implement a recursive label extraction, e.g. using dot as object separator?
@chaudum Sorry, but not my language ;-)
@sschmiedleitner No problem. I marked the issue as good first issue, so people from the community can discover it.
Just wanted to note that the fluent-bit plugin is quite niche and therefore does not have the highest priority for us.
@sschmiedleitner you could also use the LabelMapPath (docs) configuration option. You would need to build a custom image that contains the JSON, though.
Hello @chaudum. I decided to try to investigate this issue and find a good solution using the recursive method as you described above, and hopefully I found it. Could you take a look at my example of an implemented solution? If it seems right to you, I would be happy to open PR. If not, I'll try to find a better solution, so feel free to assign this issue to me if possible. :)
@sschmiedleitner you could also use the
LabelMapPath(docs) configuration option. You would need to build a custom image that contains the JSON, though.
actually I tried to avoid building an extra image as it is extra effort to find a proper place to put it etc. but anyways I made already a different workaround (rather not extracting that many labels). but still my topic is valid maybe for others.
Hi @chaudum, I see in the code that it is not resolved yet, can I potentially work on it?
I think more people run into this extra JSON parsing. We are shipping structured logs to Grafana Cloud using Opentelemetry Collector (applies to loki and otlphttp exporters). Run into similar situation where we have to run JSON parser twice. Once over all log event and next over the log attribute.
Example Query:
{Exporter="OTLP"} | json | line_format "{{ .attributes_log }}" | json
I would love to see this feature as well, since I'm running into it.
It would also be nice if it could parse a JSON object agnostic of whether it's been stringified or not.
You are supposed to parse those nested_logs using for example vector's . |= object!(parse_json!(.my_nested_json)) yourself, if you add tens/hundreds of nested fields into labels - loki index will transform into inverted index.
In case you prefer this kind of logs analytics (index all fields for searches) - Loki is not really a good option for you and you should probably look at different logging platforms i.e. Opensearch or Betterstack.