splunk-connect-for-kubernetes icon indicating copy to clipboard operation
splunk-connect-for-kubernetes copied to clipboard

unable to forward logs to index via namespace annotation

Open akandimalla opened this issue 3 years ago • 9 comments

What happened: We have an index called "logs-nonprod". We have an application named "app" and it's deployed in three different EKS clusters dev,qa,stg on three namespaces called app-dev,app-qa,app-stg. All the namespaces are annotated to the index logs-nonprod on each cluster.

dev: k annotate --overwrite ns app-dev splunk.com/index=logs-nonprod qa: k annotate --overwrite ns app-qa splunk.com/index=logs-nonprod stg: k annotate --overwrite ns app-stg splunk.com/index=logs-nonprod

ONLY logs from qa eks cluster fail to send logs to the index from the last 24 hrs. It is sending logs to "eks-default". The rest of the dev and stg is sending fine without any issues. For the qa logs, we are seeing them in the eks-default index.

What you expected to happen: Logs send to splunk index How to reproduce it (as minimally and precisely as possible): NA Anything else we need to know?: NA Environment:

  • Kubernetes version (use kubectl version):
  • Client Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.4-eks-6b7464", GitCommit:"6b746440c04cb81db4426842b4ae65c3f7035e53", GitTreeState:"clean", BuildDate:"2021-03-19T19:35:50Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.14-eks-18ef993", GitCommit:"ac73613dfd25370c18cbbbc6bfc65449397b35c7", GitTreeState:"clean", BuildDate:"2022-07-06T18:06:50Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
  • Ruby version (use ruby --version):
  • OS (e.g: cat /etc/os-release):NAME="Amazon Linux" VERSION="2" ID="amzn" ID_LIKE="centos rhel fedora" VERSION_ID="2" PRETTY_NAME="Amazon Linux 2" ANSI_COLOR="0;33" CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2" HOME_URL="https://amazonlinux.com/"
  • Splunk version: 8.1.1 enterprise
  • Splunk Connect for Kubernetes helm chart version: 1.4.15
  • Others:

akandimalla avatar Sep 30 '22 16:09 akandimalla

Hi @akandimalla, can you please share pod logs and values.yaml config file?

hvaghani221 avatar Oct 06 '22 11:10 hvaghani221

Hi @harshit-splunk - Here are the details:

global:
  logLevel: info
  splunk:
    hec:
      host: splunk.xxxx.com
      port: 443
      token: xxxxx
      protocol: https
      endpoint:
      fullUrl:
      indexName:
      insecureSSL: true
      clientCert:
      clientKey:
      caFile:
      indexRouting:
      consume_chunk_on_4xx_errors:
  kubernetes:
    clusterName: xxxxxx
  prometheus_enabled:
  monitoring_agent_enabled:
  monitoring_agent_index_name:
  metrics:
    service:
      enabled: true
      headless: true
  serviceMonitor:
    enabled: false

    metricsPort: 24231
    interval: ""
    scrapeTimeout: "10s"

    additionalLabels: { }

splunk-kubernetes-logging:
  enabled: true
  logLevel:

  namespace:

  fluentd:
    path: /var/log/containers/*.log
    exclude_path:

  containers:
    path: /var/log
    pathDest: /var/lib/docker/containers
    logFormatType: json
    logFormat:
    refreshInterval:
    removeBlankEvents: true
    localTime: false

  k8sMetadata:
    podLabels:
      - app
      - k8s-app
      - release
    watch: true
    cache_ttl: 3600

  sourcetypePrefix: "kube"

  rbac:
    create: true
    openshiftPrivilegedSccBinding: false

  serviceAccount:
    create: true
    name:

  podSecurityPolicy:
    create: false
    apparmor_security: true
    apiGroup: policy

  splunk:
    hec:
      host:
      port:
      token:
      protocol:
      endpoint:
      fullUrl:
      indexName:
      insecureSSL:
      clientCert:
      clientKey:
      caFile:
      consume_chunk_on_4xx_errors:
      gzip_compression:
    ingest_api:
      serviceClientIdentifier:
      serviceClientSecretKey:
      tokenEndpoint:
      ingestAuthHost:
      ingestAPIHost:
      tenant:
      eventsEndpoint:
      debugIngestAPI:

  secret:
    create: true
    name:

  journalLogPath: /run/log/journal

  charEncodingUtf8: false

  logs:
    docker:
      from:
        journald:
          unit: docker.service
      sourcetype: kube:docker
    kubelet: &glog
      from:
        journald:
          unit: kubelet.service
      multiline:
        firstline: /^\w[0-1]\d[0-3]\d/
      sourcetype: kube:kubelet
    etcd:
      from:
        pod: etcd-server
        container: etcd-container
    etcd-minikube:
      from:
        pod: etcd-minikube
        container: etcd
    etcd-events:
      from:
        pod: etcd-server-events
        container: etcd-container
    kube-apiserver:
      <<: *glog
      from:
        pod: kube-apiserver
      sourcetype: kube:kube-apiserver
    kube-scheduler:
      <<: *glog
      from:
        pod: kube-scheduler
      sourcetype: kube:kube-scheduler
    kube-controller-manager:
      <<: *glog
      from:
        pod: kube-controller-manager
      sourcetype: kube:kube-controller-manager
    kube-proxy:
      <<: *glog
      from:
        pod: kube-proxy
      sourcetype: kube:kube-proxy
    kubedns:
      <<: *glog
      from:
        pod: kube-dns
      sourcetype: kube:kubedns
    dnsmasq:
      <<: *glog
      from:
        pod: kube-dns
      sourcetype: kube:dnsmasq
    dns-sidecar:
      <<: *glog
      from:
        pod: kube-dns
        container: sidecar
      sourcetype: kube:kubedns-sidecar
    dns-controller:
      <<: *glog
      from:
        pod: dns-controller
      sourcetype: kube:dns-controller
    kube-dns-autoscaler:
      <<: *glog
      from:
        pod: kube-dns-autoscaler
        container: autoscaler
      sourcetype: kube:kube-dns-autoscaler
    kube-audit:
      from:
        file:
          path: /var/log/kube-apiserver-audit.log
      timestampExtraction:
        format: "%Y-%m-%dT%H:%M:%SZ"
      sourcetype: kube:apiserver-audit

  image:
    registry: docker.io
    name: splunk/fluentd-hec
    tag: 1.3.0
    pullPolicy: IfNotPresent
    usePullSecret: false
    pullSecretName:

  environmentVar:

  podAnnotations:

  extraLabels:

  resources:
    requests:
      cpu: 100m
      memory: 200Mi

  bufferChunkKeys:
  - index
  buffer:
    "@type": memory
    total_limit_size: 600m
    chunk_limit_size: 20m
    chunk_limit_records: 100000
    flush_interval: 5s
    flush_thread_count: 1
    overflow_action: block
    retry_max_times: 10
    retry_type: periodic
    retry_wait: 30

  sendAllMetadata: false

  tolerations:
    - key: node-role.kubernetes.io/master
      effect: NoSchedule

  nodeSelector:
    kubernetes.io/os: linux

  affinity: {}

  extraVolumes: []
  extraVolumeMounts: []

  priorityClassName:

  kubernetes:
    clusterName:
    securityContext: false


  customMetadata:

  customMetadataAnnotations:

  customFilters: {}

  indexFields: []

  rollingUpdate:

Logs: Even though they show errors, I checked and confirm I gave the right annotation to the name spaces.

2022-10-06 12:47:06 +0000 [info]: #0 stats - namespace_cache_size: 5, pod_cache_size: 33, pod_cache_watch_updates: 253, pod_cache_host_updates: 105, pod_cache_watch_ignored: 58, pod_cache_watch_delete_ignored: 56, namespace_cache_api_updates: 89, pod_cache_api_updates: 89, id_cache_miss: 89, pod_watch_gone_errors: 5, pod_watch_gone_notices: 5 2022-10-06 12:47:07 +0000 [error]: #0 Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1} 2022-10-06 12:47:07 +0000 [error]: #0 Fluent::Plugin::SplunkHecOutput: Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1} 2022-10-06 12:47:13 +0000 [error]: #0 Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1} 2022-10-06 12:47:13 +0000 [error]: #0 Fluent::Plugin::SplunkHecOutput: Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1} 2022-10-06 12:47:19 +0000 [error]: #0 Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1} 2022-10-06 12:47:19 +0000 [error]: #0 Fluent::Plugin::SplunkHecOutput: Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1} 2022-10-06 12:47:25 +0000 [error]: #0 Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1} 2022-10-06 12:47:25 +0000 [error]: #0 Fluent::Plugin::SplunkHecOutput: Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1} 2022-10-06 12:47:31 +0000 [error]: #0 Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1}

akandimalla avatar Oct 06 '22 12:10 akandimalla

Can you check error logs from _internal index? It will show at which index the collector is trying to send index to.

hvaghani221 avatar Oct 06 '22 12:10 hvaghani221

Also, have you modified any of the template files?

hvaghani221 avatar Oct 06 '22 13:10 hvaghani221

Can you help what is meant by _internal index? Should i check with my Splunk admin about the index error logs at their end?
No changes were made to the template files. Using the default ones.

akandimalla avatar Oct 06 '22 13:10 akandimalla

Should I check with my Splunk admin about the index error logs at their end?

Yes, Splunk will internally log in _internal index when data is sent to an invalid index.

hvaghani221 avatar Oct 06 '22 13:10 hvaghani221

The error:

2022-10-06 12:47:25 +0000 [error]: #0 Fluent::Plugin::SplunkHecOutput: Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1}

I've encountered with these 2 situations:

  1. misspelled index name in the splunk.com/index annotation (namespace or pod)
  2. missing permissions on the index for the HEC token

Unfortunately, the error message doesn't contain the name of the index that fails.

vinzent avatar Oct 06 '22 13:10 vinzent

The error:

2022-10-06 12:47:25 +0000 [error]: #0 Fluent::Plugin::SplunkHecOutput: Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1}

I've encountered with these 2 situations:

  1. misspelled index name in the splunk.com/index annotation (namespace or pod)
  2. missing permissions on the index for the HEC token

Unfortunately, the error message doesn't contain the name of the index that fails.

Checked with splink admin and they don't see any issues on their end. I checked my cluster and confirm annotations are correct on the namespaces.

akandimalla avatar Oct 06 '22 13:10 akandimalla

The error message logged by the splunk logging pod is received from the Splunk server. So your SCK is connecting to the splunk server successfully, but the server rejects your logs. If you are sure, the annotations are correct, there's just one last option: the HEC token you use doesn't have permissions for the index.

vinzent avatar Oct 06 '22 13:10 vinzent

@akandimalla any updates on this?

hvaghani221 avatar Nov 02 '22 06:11 hvaghani221

Closing due to inactivity. Feel free to reopen the issue.

hvaghani221 avatar Nov 08 '22 12:11 hvaghani221

Sorry for the delayed response. This ticket is good to close.

akandimalla avatar Nov 08 '22 14:11 akandimalla