clickhouse-operator icon indicating copy to clipboard operation
clickhouse-operator copied to clipboard

When the clickhouse-operator will perform a DNS cache delete.

Open czhfe opened this issue 3 years ago • 9 comments

image When the clickhouse-operator will perform a DNS cache delete. Do I need to add the disable_internal_dns_cache parameter when I deploy clickhouse?

Occasional failure of distributed queries because of wrong user/password. Due nature of k8s with dynamic ip allocations, it’s possible that ClickHouse would cache wrong ip-> hostname combination and disallow connections because of mismatched hostname

czhfe avatar Dec 23 '21 08:12 czhfe

clickhouse-operator call DROP DNS CACHE in following cases

  • when add / remove new nodes to cluster
  • when apply migration schema

Distributed queries may fail cause you change password for default user. Let's stay it empty, cause default user resticted to connection only from cluster nodes.

Slach avatar Dec 23 '21 09:12 Slach

clickhouse-operator call DROP DNS CACHE in following cases

  • when add / remove new nodes to cluster
  • when apply migration schema

Distributed queries may fail cause you change password for user. Let's stay it empty, cause default user resticted to connection only from cluster nodes.default

I haven't bothered to change the password for default, but occasionally I have problems with distributed query failures

czhfe avatar Dec 29 '21 05:12 czhfe

clickhouse-operator generates remote_servers config and use DNS names of kind: Service with type: ClusterIP which cover one only clickhouse statefulset (and one pod inside statefulset), this DNS usually return IP constantly from coredns.

Could you share results of the following command?

kubectl get chi -n <your_namespace> <your_chi_name> -o yaml

Slach avatar Dec 29 '21 08:12 Slach

apiVersion: clickhouse.altinity.com/v1
kind: ClickHouseInstallation
metadata:
  creationTimestamp: "2021-12-30T05:50:01Z"
  finalizers:
  - finalizer.clickhouseinstallation.altinity.com
  generation: 95
  managedFields:
  - apiVersion: clickhouse.altinity.com/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .: {}
          v:"finalizer.clickhouseinstallation.altinity.com": {}
      f:spec:
        f:configuration:
          f:clusters: {}
          f:profiles:
            f:default/allow_experimental_map_type: {}
          f:users:
            f:clickhouse_admin/access_management: {}
        f:defaults:
          f:distributedDDL: {}
        f:reconciling:
          .: {}
          f:policy: {}
        f:templates:
          f:podTemplates: {}
          f:serviceTemplates: {}
        f:templating:
          .: {}
          f:policy: {}
      f:status:
        .: {}
        f:action: {}
        f:actions: {}
        f:added: {}
        f:clusters: {}
        f:delete: {}
        f:deleted: {}
        f:endpoint: {}
        f:error: {}
        f:errors: {}
        f:fqdns: {}
        f:hosts: {}
        f:normalized:
          .: {}
          f:configuration:
            .: {}
            f:clusters: {}
            f:profiles:
              .: {}
              f:default/allow_experimental_map_type: {}
            f:settings:
              .: {}
              f:prometheus/asynchronous_metrics: {}
              f:prometheus/endpoint: {}
              f:prometheus/events: {}
              f:prometheus/metrics: {}
              f:prometheus/port: {}
              f:prometheus/status_info: {}
            f:users:
              .: {}
              f:clickhouse_admin/access_management: {}
              f:clickhouse_admin/networks/host_regexp: {}
              f:clickhouse_admin/networks/ip: {}
              f:clickhouse_admin/password_sha256_hex: {}
              f:clickhouse_admin/profile: {}
              f:clickhouse_admin/quota: {}
              f:default/networks/host_regexp: {}
              f:default/networks/ip: {}
              f:default/profile: {}
              f:default/quota: {}
            f:zookeeper:
              .: {}
              f:nodes: {}
          f:defaults:
            .: {}
            f:distributedDDL: {}
            f:replicasUseFQDN: {}
            f:templates:
              .: {}
              f:dataVolumeClaimTemplate: {}
              f:podTemplate: {}
              f:serviceTemplate: {}
          f:reconciling:
            .: {}
            f:policy: {}
          f:stop: {}
          f:templates:
            .: {}
            f:PodTemplatesIndex:
              .: {}
              f:clickhouse:
                .: {}
                f:distribution: {}
                f:metadata:
                  .: {}
                  f:creationTimestamp: {}
                f:name: {}
                f:podDistribution: {}
                f:spec:
                  .: {}
                  f:affinity:
                    .: {}
                    f:podAntiAffinity:
                      .: {}
                      f:requiredDuringSchedulingIgnoredDuringExecution: {}
                  f:containers: {}
                f:zone: {}
            f:ServiceTemplatesIndex:
              .: {}
              f:clickhouse-default:
                .: {}
                f:generateName: {}
                f:metadata:
                  .: {}
                  f:creationTimestamp: {}
                f:name: {}
                f:spec:
                  .: {}
                  f:ports: {}
                  f:type: {}
            f:VolumeClaimTemplatesIndex:
              .: {}
              f:clickhouse-data:
                .: {}
                f:name: {}
                f:reclaimPolicy: {}
                f:spec:
                  .: {}
                  f:accessModes: {}
                  f:resources:
                    .: {}
                    f:requests:
                      .: {}
                      f:storage: {}
                  f:storageClassName: {}
            f:podTemplates: {}
            f:serviceTemplates: {}
            f:volumeClaimTemplates: {}
          f:templating:
            .: {}
            f:policy: {}
        f:pods: {}
        f:replicas: {}
        f:shards: {}
        f:status: {}
        f:updated: {}
        f:version: {}
    manager: clickhouse-operator
    operation: Update
    time: "2021-12-30T05:50:01Z"
  - apiVersion: clickhouse.altinity.com/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        .: {}
        f:configuration:
          .: {}
          f:profiles:
            .: {}
            f:default/allow_experimental_map_type: {}
          f:settings:
            .: {}
            f:prometheus/asynchronous_metrics: {}
            f:prometheus/endpoint: {}
            f:prometheus/events: {}
            f:prometheus/metrics: {}
            f:prometheus/port: {}
            f:prometheus/status_info: {}
          f:users:
            .: {}
            f:clickhouse_admin/networks/ip: {}
            f:clickhouse_admin/password: {}
            f:clickhouse_admin/profile: {}
          f:zookeeper:
            .: {}
            f:nodes: {}
        f:defaults:
          .: {}
          f:templates:
            .: {}
            f:dataVolumeClaimTemplate: {}
            f:podTemplate: {}
            f:serviceTemplate: {}
        f:templates:
          .: {}
          f:volumeClaimTemplates: {}
    manager: kubectl-create
    operation: Update
    time: "2021-12-30T05:50:01Z"
  name: clickhouse
  namespace: ch1
  resourceVersion: "4960024"
  selfLink: /apis/clickhouse.altinity.com/v1/namespaces/ch1/clickhouseinstallations/clickhouse
  uid: 6b128c75-fab0-496a-a773-1a7a66c43cd6
spec:
  configuration:
    clusters:
    - address: {}
      layout:
        replicasCount: 2
        shardsCount: 2
      name: huis
      templates: {}
      zookeeper: {}
    profiles:
      default/allow_experimental_map_type: "1"
    settings:
      prometheus/asynchronous_metrics: "true"
      prometheus/endpoint: /metrics
      prometheus/events: "true"
      prometheus/metrics: "true"
      prometheus/port: "8001"
      prometheus/status_info: "true"
    users:
      clickhouse_admin/access_management: "1"
      clickhouse_admin/networks/ip: ::/0
      clickhouse_admin/password: admin
      clickhouse_admin/profile: default
    zookeeper:
      nodes:
      - host: zookeeper-0.zookeeper-headless
        port: 2181
      - host: zookeeper-1.zookeeper-headless
        port: 2181
      - host: zookeeper-2.zookeeper-headless
        port: 2181
  defaults:
    distributedDDL: {}
    templates:
      dataVolumeClaimTemplate: clickhouse-data
      podTemplate: clickhouse
      serviceTemplate: clickhouse-default
  reconciling:
    policy: ""
  templates:
    podTemplates:
    - distribution: ""
      metadata:
        creationTimestamp: null
      name: clickhouse
      podDistribution:
      - scope: Shard
        type: ShardAntiAffinity
      spec:
        containers:
        - env:
          - name: TZ
            value: Asia/Shanghai
          image: clickhouse-server/clickhouse-server:21.6.5.37
          name: clickhouse-pod
          ports:
          - containerPort: 8001
            name: metrics
          resources:
            limits:
              cpu: "1"
              memory: 2Gi
            requests:
              cpu: 100m
              memory: 512Mi
      zone: {}
    serviceTemplates:
    - generateName: clickhouse-server
      metadata:
        creationTimestamp: null
      name: clickhouse-default
      spec:
        ports:
        - name: http
          port: 8123
          targetPort: 0
        - name: tcp
          port: 9000
          targetPort: 0
        type: ClusterIP
    volumeClaimTemplates:
    - name: clickhouse-data
      reclaimPolicy: Retain
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
        storageClassName: rook-ceph-block-delete
  templating:
    policy: ""
status:
  action: ""
  actions:
  - reconcile started
  - Update Service ch1/clickhouse-server
  - Update ConfigMap ch1/chi-clickhouse-common-usersd
  - Reconcile Host 0-0 started
  - Update ConfigMap ch1/chi-clickhouse-deploy-confd-huis-0-0
  - Update StatefulSet(ch1/chi-clickhouse-huis-0-0) - started
  - Update StatefulSet(ch1/chi-clickhouse-huis-0-0) - completed
  - Update Service ch1/chi-clickhouse-huis-0-0
  - Adding tables on shard/host:0/0 cluster:huis
  - Update ConfigMap ch1/chi-clickhouse-common-configd
  - Reconcile Host 0-0 completed
  - Reconcile Host 0-1 started
  - Update ConfigMap ch1/chi-clickhouse-deploy-confd-huis-0-1
  - Update StatefulSet(ch1/chi-clickhouse-huis-0-1) - started
  - Update StatefulSet(ch1/chi-clickhouse-huis-0-1) - completed
  - Update Service ch1/chi-clickhouse-huis-0-1
  - Adding tables on shard/host:0/1 cluster:huis
  - Update ConfigMap ch1/chi-clickhouse-common-configd
  - Reconcile Host 0-1 completed
  - Reconcile Host 1-0 started
  - Update ConfigMap ch1/chi-clickhouse-deploy-confd-huis-1-0
  - Update StatefulSet(ch1/chi-clickhouse-huis-1-0) - started
  - Update StatefulSet(ch1/chi-clickhouse-huis-1-0) - completed
  - Update Service ch1/chi-clickhouse-huis-1-0
  - Adding tables on shard/host:1/0 cluster:huis
  - Update ConfigMap ch1/chi-clickhouse-common-configd
  - Reconcile Host 1-0 completed
  - Reconcile Host 1-1 started
  - Update ConfigMap ch1/chi-clickhouse-deploy-confd-huis-1-1
  - Update StatefulSet(ch1/chi-clickhouse-huis-1-1) - started
  - Update StatefulSet(ch1/chi-clickhouse-huis-1-1) - completed
  - Update Service ch1/chi-clickhouse-huis-1-1
  - Adding tables on shard/host:1/1 cluster:huis
  - Update ConfigMap ch1/chi-clickhouse-common-configd
  - Reconcile Host 1-1 completed
  - Update ConfigMap ch1/chi-clickhouse-common-configd
  - remove items scheduled for deletion
  - remove items scheduled for deletion
  - add CHI to monitoring
  - reconcile completed
  added: 0
  clusters: 1
  delete: 0
  deleted: 0
  endpoint: clickhouse-server.ch1.svc.cluster.local
  error: ""
  errors: null
  fqdns:
  - chi-clickhouse-huis-0-0.ch1.svc.cluster.local
  - chi-clickhouse-huis-0-1.ch1.svc.cluster.local
  - chi-clickhouse-huis-1-0.ch1.svc.cluster.local
  - chi-clickhouse-huis-1-1.ch1.svc.cluster.local
  hosts: 4
  normalized:
    configuration:
      clusters:
      - address:
          chiName: clickhouse
          clusterName: huis
          namespace: ch1
        layout:
          replicas:
          - address:
              chiName: clickhouse
              clusterIndex: 0
              clusterName: huis
              namespace: ch1
              replicaIndex: 0
              replicaName: "0"
            name: "0"
            shards:
            - httpPort: 8123
              interserverHTTPPort: 9009
              name: 0-0
              tcpPort: 9000
              templates:
                dataVolumeClaimTemplate: clickhouse-data
                podTemplate: clickhouse
                serviceTemplate: clickhouse-default
            - httpPort: 8123
              interserverHTTPPort: 9009
              name: 1-0
              tcpPort: 9000
              templates:
                dataVolumeClaimTemplate: clickhouse-data
                podTemplate: clickhouse
                serviceTemplate: clickhouse-default
            shardsCount: 2
            templates:
              dataVolumeClaimTemplate: clickhouse-data
              podTemplate: clickhouse
              serviceTemplate: clickhouse-default
          - address:
              chiName: clickhouse
              clusterIndex: 0
              clusterName: huis
              namespace: ch1
              replicaIndex: 1
              replicaName: "1"
            name: "1"
            shards:
            - httpPort: 8123
              interserverHTTPPort: 9009
              name: 0-1
              tcpPort: 9000
              templates:
                dataVolumeClaimTemplate: clickhouse-data
                podTemplate: clickhouse
                serviceTemplate: clickhouse-default
            - httpPort: 8123
              interserverHTTPPort: 9009
              name: 1-1
              tcpPort: 9000
              templates:
                dataVolumeClaimTemplate: clickhouse-data
                podTemplate: clickhouse
                serviceTemplate: clickhouse-default
            shardsCount: 2
            templates:
              dataVolumeClaimTemplate: clickhouse-data
              podTemplate: clickhouse
              serviceTemplate: clickhouse-default
          replicasCount: 2
          shards:
          - address:
              chiName: clickhouse
              clusterIndex: 0
              clusterName: huis
              namespace: ch1
              shardIndex: 0
              shardName: "0"
            definitionType: ""
            internalReplication: "true"
            name: "0"
            replicas:
            - httpPort: 8123
              interserverHTTPPort: 9009
              name: 0-0
              tcpPort: 9000
              templates:
                dataVolumeClaimTemplate: clickhouse-data
                podTemplate: clickhouse
                serviceTemplate: clickhouse-default
            - httpPort: 8123
              interserverHTTPPort: 9009
              name: 0-1
              tcpPort: 9000
              templates:
                dataVolumeClaimTemplate: clickhouse-data
                podTemplate: clickhouse
                serviceTemplate: clickhouse-default
            replicasCount: 2
            templates:
              dataVolumeClaimTemplate: clickhouse-data
              podTemplate: clickhouse
              serviceTemplate: clickhouse-default
          - address:
              chiName: clickhouse
              clusterIndex: 0
              clusterName: huis
              namespace: ch1
              shardIndex: 1
              shardName: "1"
            definitionType: ""
            internalReplication: "true"
            name: "1"
            replicas:
            - httpPort: 8123
              interserverHTTPPort: 9009
              name: 1-0
              tcpPort: 9000
              templates:
                dataVolumeClaimTemplate: clickhouse-data
                podTemplate: clickhouse
                serviceTemplate: clickhouse-default
            - httpPort: 8123
              interserverHTTPPort: 9009
              name: 1-1
              tcpPort: 9000
              templates:
                dataVolumeClaimTemplate: clickhouse-data
                podTemplate: clickhouse
                serviceTemplate: clickhouse-default
            replicasCount: 2
            templates:
              dataVolumeClaimTemplate: clickhouse-data
              podTemplate: clickhouse
              serviceTemplate: clickhouse-default
          shardsCount: 2
        name: huis
        templates:
          dataVolumeClaimTemplate: clickhouse-data
          podTemplate: clickhouse
          serviceTemplate: clickhouse-default
        zookeeper:
          nodes:
          - host: zookeeper-0.zookeeper-headless
            port: 2181
          - host: zookeeper-1.zookeeper-headless
            port: 2181
          - host: zookeeper-2.zookeeper-headless
            port: 2181
      profiles:
        default/allow_experimental_map_type: "1"
      settings:
        prometheus/asynchronous_metrics: "true"
        prometheus/endpoint: /metrics
        prometheus/events: "true"
        prometheus/metrics: "true"
        prometheus/port: "8001"
        prometheus/status_info: "true"
      users:
        clickhouse_admin/access_management: "1"
        clickhouse_admin/networks/host_regexp: (chi-clickhouse-[^.]+\d+-\d+|clickhouse\-clickhouse)\.ch1\.svc\.cluster\.local$
        clickhouse_admin/networks/ip: ::/0
        clickhouse_admin/password_sha256_hex: 8c6976e5b5410415bde908bd4dee15dfb167a9c873fc4bb8a81f6f2ab448a918
        clickhouse_admin/profile: default
        clickhouse_admin/quota: default
        default/networks/host_regexp: (chi-clickhouse-[^.]+\d+-\d+|clickhouse\-clickhouse)\.ch1\.svc\.cluster\.local$
        default/networks/ip:
        - ::1
        - 127.0.0.1
        default/profile: default
        default/quota: default
      zookeeper:
        nodes:
        - host: zookeeper-0.zookeeper-headless
          port: 2181
        - host: zookeeper-1.zookeeper-headless
          port: 2181
        - host: zookeeper-2.zookeeper-headless
          port: 2181
    defaults:
      distributedDDL: {}
      replicasUseFQDN: "false"
      templates:
        dataVolumeClaimTemplate: clickhouse-data
        podTemplate: clickhouse
        serviceTemplate: clickhouse-default
    reconciling:
      policy: unspecified
    stop: "false"
    templates:
      PodTemplatesIndex:
        clickhouse:
          distribution: Unspecified
          metadata:
            creationTimestamp: null
          name: clickhouse
          podDistribution:
          - scope: Shard
            type: ShardAntiAffinity
          spec:
            affinity:
              podAntiAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                - labelSelector:
                    matchLabels:
                      clickhouse.altinity.com/chi: '{chi}'
                      clickhouse.altinity.com/cluster: '{cluster}'
                      clickhouse.altinity.com/namespace: '{namespace}'
                      clickhouse.altinity.com/shard: '{shard}'
                  topologyKey: kubernetes.io/hostname
            containers:
            - env:
              - name: TZ
                value: Asia/Shanghai
              image: clickhouse-server/clickhouse-server:21.6.5.37
              name: clickhouse-pod
              ports:
              - containerPort: 8001
                name: metrics
              resources:
                limits:
                  cpu: "1"
                  memory: 2Gi
                requests:
                  cpu: 100m
                  memory: 512Mi
          zone: {}
      ServiceTemplatesIndex:
        clickhouse-default:
          generateName: clickhouse-server
          metadata:
            creationTimestamp: null
          name: clickhouse-default
          spec:
            ports:
            - name: http
              port: 8123
              targetPort: 0
            - name: tcp
              port: 9000
              targetPort: 0
            type: ClusterIP
      VolumeClaimTemplatesIndex:
        clickhouse-data:
          name: clickhouse-data
          reclaimPolicy: Retain
          spec:
            accessModes:
            - ReadWriteOnce
            resources:
              requests:
                storage: 5Gi
            storageClassName: rook-ceph-block-delete
      podTemplates:
      - distribution: Unspecified
        metadata:
          creationTimestamp: null
        name: clickhouse
        podDistribution:
        - scope: Shard
          type: ShardAntiAffinity
        spec:
          affinity:
            podAntiAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
              - labelSelector:
                  matchLabels:
                    clickhouse.altinity.com/chi: '{chi}'
                    clickhouse.altinity.com/cluster: '{cluster}'
                    clickhouse.altinity.com/namespace: '{namespace}'
                    clickhouse.altinity.com/shard: '{shard}'
                topologyKey: kubernetes.io/hostname
          containers:
          - env:
            - name: TZ
              value: Asia/Shanghai
            image: clickhouse-server/clickhouse-server:21.6.5.37
            name: clickhouse-pod
            ports:
            - containerPort: 8001
              name: metrics
            resources:
              limits:
                cpu: "1"
                memory: 2Gi
              requests:
                cpu: 100m
                memory: 512Mi
        zone: {}
      serviceTemplates:
      - generateName: clickhouse-server
        metadata:
          creationTimestamp: null
        name: clickhouse-default
        spec:
          ports:
          - name: http
            port: 8123
            targetPort: 0
          - name: tcp
            port: 9000
            targetPort: 0
          type: ClusterIP
      volumeClaimTemplates:
      - name: clickhouse-data
        reclaimPolicy: Retain
        spec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 5Gi
          storageClassName: rook-ceph-block-delete
    templating:
      policy: manual
  pods:
  - chi-clickhouse-huis-0-0-0
  - chi-clickhouse-huis-0-1-0
  - chi-clickhouse-huis-1-0-0
  - chi-clickhouse-huis-1-1-0
  replicas: 0
  shards: 2
  status: Completed
  updated: 4
  version: 0.13.5

czhfe avatar Dec 30 '21 06:12 czhfe

Distributed query errors are as follows:

image

czhfe avatar Dec 30 '21 06:12 czhfe

Maybe issue is related to PTR DNS response inside your cluster https://github.com/ClickHouse/ClickHouse/issues/17202

Could you use https://github.com/eldadru/ksniff

and try to sniff DNS traffic inside your clickhouse pod ?

Slach avatar Dec 30 '21 10:12 Slach

Maybe issue is related to PTR DNS response inside your cluster ClickHouse/ClickHouse#17202

Could you use https://github.com/eldadru/ksniff

and try to sniff DNS traffic inside your clickhouse pod ?

Running in a kubernetes environment does have multiple DNS PTR records.

image

image

This distributed query error due to multiple DNS PTR records should be handled efficiently if。

czhfe avatar Jan 01 '22 10:01 czhfe

image you can change

spec:
 users:
   default/networks/host_regexp: clickhouse.svc.cluster.local$

Slach avatar Jan 02 '22 14:01 Slach

image you can change

spec:
 users:
   default/networks/host_regexp: clickhouse.svc.cluster.local$

Thank you very much, I think I already know the reason and how to solve the problem.

czhfe avatar Jan 04 '22 05:01 czhfe