JSON unmarshalling of labels fails to parse boolean values
Describe the bug
From version 2.9.0, JSON payloads sent to the HTTP push endpoint fail when there are boolean values in the labels section (under the field stream).
To Reproduce
Given a locally running loki instance on port 3100, execute the following bash script:
export data='{
"streams": [
{
"stream": {
"namespace": "hey",
"has_errors": true
}
}
]
}'
curl -v http://localhost:3100/loki/api/v1/push -H "Content-Type: application/json" --data "$data"
The script outputs:
* Trying 127.0.0.1:3100...
* Connected to localhost (127.0.0.1) port 3101 (#0)
> POST /loki/api/v1/push HTTP/1.1
> Host: localhost:3100
> User-Agent: curl/7.81.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 155
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 400 Bad Request
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Thu, 28 Sep 2023 13:17:42 GMT
< Content-Length: 264
<
loghttp.PushRequest.Streams: []loghttp.LogProtoStream: unmarshalerDecoder: Value is string, but can't find closing '"' symbol, error found in #10 byte of ...|
}
]
}|..., bigger context ...| "has_errors": true
}
}
]
}|...
* Connection #0 to host localhost left intact
I used a simple local docker-compose to spin up Loki instances, and confirmed that the issue does not exist in 2.8.5 but exists in both 2.9.0 and 2.9.1:
version: "3"
networks:
loki:
services:
loki:
image: grafana/loki:2.9.0
ports:
- "3100:3100"
command: -config.file=/etc/loki/local-config.yaml
networks:
- loki
Expected behavior In 2.8.5 and previous versions the label is accepted (response code 204 No Content). I am not entirely sure what happens to boolean values or how they are included in the data...
Environment:
- Infrastructure: docker-compose on laptop
- Deployment tool: N/A
On first sight it looks like changes in pgk/loghttp/entry.go are responsible for this. Here the individual keys and values are tried being parsed as string, so any other datatype as value should fail.
I will take a shot at this later today. :)
Sorry, my first sight was incorrect. :D Your issue should actually already been fixed with this commit:
https://github.com/grafana/loki/commit/5ab9515020658860053590c9e38e0262be78a9b1
However this is not part of v2.9.2
Ah that looks like it would do it, yes, @szczepad !
How do I track which release it goes out in? I'll test as soon as it becomes available in a release :smile:
@szczepad fwiw this still seems to be a bug in 2.9.6 - but it works in 3.0.0 :smile:
I'd love to get it out in 2.x so we can get the fix without having to upgrade major version :smile:
This seems like the opposite of what you'd probably want btw? 3.0.0 seems like the obvious time to get rid of non-string values in labels?
hey sorry to bother you , this bug is ok in 2.9.7 :) we don't have log since this error .... thanks for all !
I am still seeing the error in 2.9.7.
docker-compose.yml:
version: "3"
networks:
loki:
services:
loki-2.9.7:
image: grafana/loki:2.9.7
ports:
- "3199:3100"
command: -config.file=/etc/loki/local-config.yaml
networks:
- loki
curl.sh
export data='{
"streams": [
{
"stream": {
"namespace": "hey",
"has_errors": true
}
}
]
}'
echo -e '\nLOKI 2.9.7 TEST BOOLEAN PARSE\n'
curl -v -X POST http://localhost:3199/loki/api/v1/push -H "Content-Type: application/json" --data "$data"
Running my test:
$ ./curl.sh
LOKI 2.9.7 TEST BOOLEAN PARSE
Note: Unnecessary use of -X or --request, POST is already inferred.
* Trying 127.0.0.1:3199...
* Connected to localhost (127.0.0.1) port 3199 (#0)
> POST /loki/api/v1/push HTTP/1.1
> Host: localhost:3199
> User-Agent: curl/7.81.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 155
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 400 Bad Request
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Wed, 10 Apr 2024 14:42:14 GMT
< Content-Length: 264
<
loghttp.PushRequest.Streams: []loghttp.LogProtoStream: unmarshalerDecoder: Value is string, but can't find closing '"' symbol, error found in #10 byte of ...|
}
]
}|..., bigger context ...| "has_errors": true
}
}
]
}|...
* Connection #0 to host localhost left intact
thanks for all :) ok so we need to go to 3.0 ... I will begin to move :/
Same issue here with Loki 3.0.0 and Logging-Operator 4.9.0.
@R-Studio I checked 3.0.0 and didn't find the error there, c.f. https://github.com/grafana/loki/issues/10738#issuecomment-2044384294 - can you share a minimal reproduction of your setup to see if it is the same issue?
So I have installed Logging-Operator version 4.9.0 with following ClusterFlow & ClusterOutput (Helm values):
...
clusterFlows:
- name: all-logs
spec:
filters:
- tag_normaliser: {}
match:
- exclude:
labels:
app.kubernetes.io/name: event-tailer
app.kubernetes.io/instance: audit-host-tailer
- select: {}
globalOutputRefs:
- loki-all-output
clusterOutputs:
- name: loki-all-output
spec:
loki:
url: http://loki-gateway.loki.svc.cluster.local
line_format: key_value
configure_kubernetes_labels: true
extract_kubernetes_labels: true
include_thread_label: false
drop_single_key: true
remove_keys:
- logtag
- stream
- time
buffer:
disable_chunk_backup: true
flush_thread_count: 8
flush_mode: immediate
timekey: 1m
timekey_wait: 5s
retry_forever: false
retry_timeout: 24h
...
And I get the following error in the Fluentd pods of the Logging-Operator:
2024-08-22 12:12:39 +0000 [warn]: #0 [clusterflow:logging-operator:all-logs:clusteroutput:logging-operator:loki-all-output] failed to write post to http://loki-gateway.loki.svc.cluster.local/loki/api/v1/push (400 Bad Request loghttp.PushRequest.Streams: []loghttp.LogProtoStream: unmarshalerDecoder: Value is string, but can't find closing '"' symbol, error found in #10 byte of ...|eged"}}]]}]}|..., bigger context ...|-security.kubernetes.io/enforce":"privileged"}}]]}]}|...
)
We have Loki installed with the Helm chart version 6.6.3.
I upgrade Loki to the newest Helm chart 6.10.0 (no difference):
message="level=error ts=2024-08-22T12:24:40.459112136Z caller=manager.go:49 component=distributor path=write msg=\"write operation failed\" details=\"couldn't parse push request: loghttp.PushRequest.Streams: []loghttp.LogProtoStream: unmarshalerDecoder: Value is string, but can't find closing '\\\"' symbol, error found in #10 byte of ...|\\\\\\\"}}\\\\\\\"\\\"]]}]}|..., bigger context ...|ernetes.io/enforce\\\\\\\\\\\\\\\"=>\\\\\\\\\\\\\\\"privileged\\\\\\\\\\\\\\\"}}\\\\\\\"\\\"]]}]}|...\" org_id=fake" kubernetes_namespace="{\"name\"=>\"loki\", \"labels\"=>{\"kubernetes.io/metadata.name\"=>\"loki\"}}"
I found a solution/workaround by removing the key kubernetes_namespace. So my new ClusterOutput looks like this:
...
clusterOutputs:
- name: loki-all-output
spec:
loki:
url: http://loki-gateway.loki.svc.cluster.local
line_format: key_value
configure_kubernetes_labels: true
extract_kubernetes_labels: true
include_thread_label: false
drop_single_key: true
remove_keys:
- logtag
- stream
- time
- kubernetes_namespace # This fixes namespace labels extraction (https://github.com/grafana/loki/issues/10738)
buffer:
disable_chunk_backup: true
flush_thread_count: 8
flush_mode: immediate
timekey: 1m
timekey_wait: 5s
retry_forever: false
retry_timeout: 24h
We are experiencing the same behavior as described, workaround with kubernetes_namespace helped as well. Installed Helm-chart versions are:
- Logging-operator: 5.1.1
- Loki: 6.25.1
We have been seeing an almost identical error using recent versions of Logging Operator and Loki (v3.4.2).
level=error ts=2025-05-14T11:48:03.623136385Z caller=manager.go:50 component=distributor path=write msg="write operation failed" details="couldn't parse push request: loghttp.PushRequest.Streams: []loghttp.LogProtoStream: unmarshalerDecoder: Value is string, but can't find closing '\"' symbol, error found in #10 byte of ...|3\",null]]}]}|..., bigger context ...|06\",\"(no headers)\"],[\"1747223280349181203\",null]]}]}|..." org_id=fake
Since we never see the entire log message it is very hard for us to pinpoint the exact root cause or any shared traits between the messages. We do however see an increased occurrences of null values in the short bits of the log message.
We have adjusted our ClusterFlow to drop empty messages and now we have not seen any unmarshalerDecoder errors for almost an hour and they used to be a steady stream of them.
filters:
# drop empty messages
- grep:
exclude:
- key: message
pattern: "/^\\s*$/"
I think we're hitting the same bug, but not finding any error messages, so not 100% sure. But here goes:
We're using kube-logging as well, have a basic flow that parses a message as json.
On the output, we try to set .spec.loki.labels.blocked: $.transaction.is_interrupted, where $.transaction.is_interrupted is a boolean value. Logs end up in Loki as expected, except that the blocked label is missing. The $.transaction.is_interrupted field is no longer in the record, so it has been parsed.
We currently circumvented this by using a record modifier to turn the boolean into a string. Doing this, the blocked label does show up as expected. But it's kind of an ugly hack. IMHO, if Loki can't take non-string values for labels, the fluentd plugin should normalize labels into strings.