fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

could not pack/validate JSON response

Open acarsercan opened this issue 6 years ago • 8 comments

Bug Report

Describe the bug I'm running fluent-bit inside kubernetes to pass pod log entries to Elastisearch.

I can see majority of the logs coming in, however I've noticed the following errors from fluent-bit pods running on each kubernetes node.

How can I identify which pod is causing the error? How can I resolve the issue below?

logs/fluent-bit-zjbxx[fluent-bit]: [2019/10/23 07:16:53] [error] [out_es] could not pack/validate JSON response 
logs/fluent-bit-nqdwb[fluent-bit]: [2019/10/23 07:17:00] [error] [out_es] could not pack/validate JSON response

Environment Kubernetes 1.13 fluent-bit helm chart fluent-bit 1.2.2

Configmap from kubernetes

data:
  fluent-bit-filter.conf: |-
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_Tag_Prefix     kube.var.log.containers.
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Merge_Log           On
        K8S-Logging.Parser  On
        K8S-Logging.Exclude On
        Merge_Log_Trim On
  fluent-bit-input.conf: "[INPUT]\n    Name             tail\n    Path             /var/log/containers/*.log\n
    \   Parser           docker\n    Tag              kube.*\n    Refresh_Interval
    5\n    Mem_Buf_Limit    5MB\n    Skip_Long_Lines  On\n    "
  fluent-bit-output.conf: "\n[OUTPUT]\n    Name  es\n    Match *\n    Host  XXXX\n
    \   Port  9200\n    Logstash_Format On\n    Retry_Limit False\n    Type  flb_type\n
    \   Time_Key @timestamp\n    Replace_Dots On\n    Logstash_Prefix kubernetes_cluster\n\n
    \   HTTP_User admin\n    HTTP_Passwd XXXX\n\n    tls on\n    tls.verify
    off\n    tls.debug 1\n    tls.ca_file /secure/es-tls-ca.crt\n\n    "
  fluent-bit-service.conf: |-
    [SERVICE]
        Flush        1
        Daemon       Off
        Log_Level    info
        Parsers_File parsers.conf
        Parsers_File parsers_custom.conf
  fluent-bit.conf: |-
    @INCLUDE fluent-bit-service.conf
    @INCLUDE fluent-bit-input.conf
    @INCLUDE fluent-bit-filter.conf
    @INCLUDE fluent-bit-output.conf
kind: ConfigMap
metadata:
  annotations:
    flux.weave.works/antecedent: logs:helmrelease/fluent-bit
  creationTimestamp: "2019-05-30T08:05:58Z"
  labels:
    app: fluent-bit
    chart: fluent-bit-2.7.3
    heritage: Tiller
    release: fluent-bit
  name: fluent-bit-config
  namespace: logs
  resourceVersion: "99683776"
  selfLink: /api/v1/namespaces/logs/configmaps/fluent-bit-config
  uid: c1c81cd4-82b1-11e9-bace-06a20d1eb048


acarsercan avatar Oct 23 '19 07:10 acarsercan

Would you please upgrade to Fluent Bit v1.3.2 and let us know how it behaves?

Note: just in case, always read the upgrade notes: https://docs.fluentbit.io/manual/installation/upgrade_notes

edsiper avatar Nov 01 '19 11:11 edsiper

Thanks for the reply, I've upgraded since my query but unfortunately it has mad no change. Is there anyway for me to track which pod/app is causing [out_es] could not pack/validate JSON response

I've enabled debug in fluent-bit but unfortunately I'm failing to make much sense from the output

[2019/11/01 12:13:26] [debug] [out_es Elasticsearch response
{"took":3257,"errors":false,"items":[{"index":{"_index":"kubernetes_cluster-2019.10.31","_type":"_doc","_id":"2AriJm4BBmTb-erSaYRK","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":31403479,"_primary_term":1,"status":201}},{"index":{"_index":"kubernetes_cluster-2019.10.31","_type":"_doc","_id":"2QriJm4BBmTb-erSaYRK","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":31403481,"_primary_term":1,"status":201}},{"index":{"_index":"kubernetes_cluster-2019.10.31","_type":"_doc","_id":"2griJm4BBmTb-erSaYRK","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":31403483,"_primary_term":1,"status":201}},{"index":{"_index":"kubernetes_cluster-2019.10.31","_type":"_doc","_id":"2wriJm4BBmTb-erSaYRK","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":31403485,"_primary_term":1,"status":201}},{"index
[2019/11/01 12:13:26] [debug] [filter_kube] could not merge JSON log as requested
[2019/11/01 12:13:26] [debug] [filter_kube] could not merge JSON log as requested
[2019/11/01 12:13:26] [debug] [filter_kube] could not merge JSON log as requested

[2019/11/01 12:13:26] [debug] [filter_kube] could not merge JSON log as requested
[2019/11/01 12:13:26] [debug] [in_tail] file=/var/log/containers/calico-node-rqljw_kube-system_calico-node-495f261132e303ef2258badecb7448130df05c6898963b51862dd360f5ade638.log read=32575 lines=116
[2019/11/01 12:13:26] [debug] [in_tail] file=/var/log/containers/concourse-web-8d955d58d-xhnmm_concourse_concourse-web-3768034ad7be3f792006afa2e9e3b8eb772d065b632d7fe08d280d3b37cbea46.log read=32445 lines=104
[2019/11/01 12:13:27] [debug] [filter_kube] could not merge JSON log as requested
[2019/11/01 12:13:27] [debug] [filter_kube] could not merge JSON log as requested

[2019/11/01 12:13:27] [debug] [in_tail] file=/var/log/containers/nginx-ingress-internal-controller-574d88b97d-4ntgg_nginx-ingress-internal_nginx-ingress-controller-9e22c1fe05fcbf8213b7e5b3c18642a7e61fb5813ecc96f8bb4eb0baa7cbb859.log read=32606 lines=86
[2019/11/01 12:13:27] [debug] [in_tail] file=/var/log/containers/production-dynamo-latest-schedule-service-6c55cd95dc-47wf2_production-dynamo_dynamo-latest-schedule-service-8aeb20c3bd60fd2d8f0caef778e500e40b3d8b00510494b06e1900c5af4ca83c.log read=6696 lines=18
[2019/11/01 12:13:27] [debug] [filter_kube] could not merge JSON log as requested
[2019/11/01 12:13:27] [debug] [filter_kube] could not merge JSON log as requested

acarsercan avatar Nov 01 '19 12:11 acarsercan

So the main requirement is: "how do I identify which Pod is not logging in JSON? "

I will improve the message for debugging purposes, makes sense.

edsiper avatar Nov 01 '19 12:11 edsiper

Hi @edsiper,

I come across with this error against Fluent Bit v1.3.7. Below is the log entry and we can clear see that closing braces '}}' are missing from the request. Could this be related to the fact that it reached to the allowed character limit for the log document? And it is striping out the rest of the request. Below is my log entry for your reference. All my log entries it is failing has 937 character

[2020/02/28 08:08:53] [error] [out_es] could not pack/validate JSON response {"took":203, "errors":true, "items":[ {"index":{ "_index":"logs-2020.02.10", "_type":"_doc","_id":"xxxxxxxxxxxxxxxxxxx", "_version":1,"result":"created", "_shards":{"total":2,"successful":2,"failed":0}, "_seq_no":11379, "_primary_term":1, "status":201 } }, {"index":{ "_index":"logs-2020.02.10", "_type":"_doc", "_id":"xxxxxxxxxxxxxxxxxxx", "_version":1, "result":"created", "_shards":{"total":2,"successful":2,"failed":0}, "_seq_no":11455, "_primary_term":1, "status":201 } }, {"index":{ "_index":"logs-2020.02.10", "_type":"_doc", "_id":"xxxxxxxxxxxxxxxxxxx", "_version":1, "result":"created", "_shards":{"total":2,"successful":2,"failed":0}, "_seq_no":11380,"_primary_term":1,"status":201 } }, {"index":{ "_index":"logs-2020.02.10", "_type":"_doc", "_id":"xxxxxxxxxxxxxxxxxxx", "_version":1, "result":"created", "_shards":{"total":2,"successful":2,"failed":0}, "_seq_no":11456, "_primary_term":1, "status" [2020/02/28 08:08:53] [ warn] [engine] failed to flush chunk '1-********************.flb', retry in 867 seconds: task_id=14, input=tail.0 > output=es.0 [2020/02/28 08:23:20] [error] [out_es] could not pack/validate JSON response {"took":97,"errors":true,"items":[{"index":{"_index":"logs-2020.02.10","_type":"_doc","_id":"xxxxxxxxxxxxxxxxxxx" Regards, Shiv

shivshankarb avatar Feb 28 '20 13:02 shivshankarb

Hi @edsiper, I observed that this error is followed by the warning against specific document only..

` [2020/02/28 12:50:29] [ warn] [engine] failed to flush chunk '1-1582845332.575759250.flb', retry in 1863 seconds: task_id=14, input=tail.0 > output=es.0 . . .

Feb 28, 2020 @ 00:09:46.200 | [2020/02/28 00:09:46] [ warn] [engine] failed to flush chunk '1-1582845332.575759250.flb', retry in 364 seconds: task_id=14, input=tail.0 > output=es.0 ` Is there any way I can get rid of this specific document or instrcut flunt-bit to ignore? Regards, Shiv

shivshankarb avatar Feb 28 '20 13:02 shivshankarb

Hi,

I'm getting something similar:

[2020/09/14 17:51:42] [error] [out_es] could not pack/validate JSON response {"took":146,"errors":true,"items":[{"index":{"_index":"k8s-qa-2020.09.14-00001","_type":"_doc","_id":"5t2-jXQBx8Yze5ALb7r5","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":841544,"_primary_term":1,"status":201}},{"index":{"_index":"k8s-qa-2020.09.14-00001","_type":"_doc","_id":"592-jXQBx8Yze5ALb7r5","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":841545,"_primary_term":1,"status":201}},{"index":{"_index":"k8s-qa-2020.09.14-00001","_type":"_doc","_id":"6N2-jXQBx8Yze5ALb7r5","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":841546,"_primary_term":1,"status":201}},{"index":{"_index":"k8s-qa-2020.09.14-00001","_type":"_doc","_id":"6d2-jXQBx8Yze5ALb7r5","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":841547,"_primary_term":1,"status":201}},{"index":{"_index":"k8s-q

Since I first saw those errors I've upgraded the helm chart from "2.8.14" to "2.8.16". I'll see whether it makes any difference. Anyway the error pasted above showed up after the upgrade.

Also, logs like the one above, seem to be getting truncated too. Not sure whether there is a parameter to avoid truncation...

Regards, Andrea

andreabolandrina avatar Sep 14 '20 17:09 andreabolandrina

hello, is there something new?

I met same issue at "3.1.9"

liuchintao avatar Nov 05 '24 07:11 liuchintao

At present, I am also facing such a problem with fluent bit version Fluent Bit v3.2.10. I am also getting same issue.

below error : "2025-11-28T07:20:10.569140943Z - ERROR - output:es:es_platform_test_haproxy - could not pack/validate JSON response",

"2025-11-28T07:20:10.571923906Z - WARN - engine - failed to flush chunk '489711-1764072120.279429671.flb', retry in 610 seconds: task_id=15, input=emitter_for_rewrite_tag.16 > output=es_platform_test_haproxy (out_id=129)",

"2025-11-28T07:20:47.602257507Z - WARN - engine - failed to flush chunk '491990-1764155363.322675567.flb', retry in 576 seconds: task_id=5, input=emitter_for_rewrite_tag.14 > output=es_uat_docker_global (out_id=166)",

What is the exact reason ?

yogesh119905 avatar Nov 28 '25 07:11 yogesh119905