apollo icon indicating copy to clipboard operation
apollo copied to clipboard

k8s集群环境多副本pod无法登陆

Open wzdushu opened this issue 4 years ago • 7 comments

版本信息 k8s version: 1.14.1 apollo version:1.7.1 traefik version: 1.7 helm chart version: 0.1.1

问题描述

参考这里:https://github.com/ctripcorp/apollo/wiki/%E5%88%86%E5%B8%83%E5%BC%8F%E9%83%A8%E7%BD%B2%E6%8C%87%E5%8D%97#241-%E5%9F%BA%E4%BA%8Ekubernetes%E5%8E%9F%E7%94%9F%E6%9C%8D%E5%8A%A1%E5%8F%91%E7%8E%B0 部署了apollo后,并集成了ldap(windows ad),apollo-portal如果配置多pod副本,系统无法登陆, 修改为单副本的pod,就可以正常登陆。

看官方配置只有nginx ingress controller的例子,我们实际生成环境用的traefik ingress controller,我依照官方例子修改如下信息以后还是无法登陆;实际测试中通过chrome debug可以看出其实已经登陆成功,只是没有跳转到后台,不知道哪里的问题,还请帮忙解惑。

helm chart参考这里:https://github.com/ctripcorp/apollo/tree/master/docs/charts

ingress配置如下:

apiVersion: v1
items:
- apiVersion: extensions/v1beta1
  kind: Ingress
  metadata:
    annotations:
      field.cattle.io/ingressState: '{"YXBvbGxvLXBvcnRhbC9tYWxsL2FpYmVlLmNuLy8vODA3MA==":""}'
      traefik.ingress.kubernetes.io/affinity: "true"
      traefik.ingress.kubernetes.io/ingress.class: traefik
      traefik.ingress.kubernetes.io/load-balancer-method: drr
      traefik.ingress.kubernetes.io/max-conn-amount: "10"
      traefik.ingress.kubernetes.io/session-cookie-name: JSESSIONID
    creationTimestamp: "2020-12-22T07:02:24Z"
    generation: 3
    labels:
      app.kubernetes.io/version: 1.7.1
    name: apollo-portal
    namespace: mall
    resourceVersion: "336484675"
    selfLink: /apis/extensions/v1beta1/namespaces/mall/ingresses/apollo-portal
    uid: a48ff897-4423-11eb-b624-ac1f6b6ca72e
  spec:
    rules:
    - host: conf.test.cn
      http:
        paths:
        - backend:
            serviceName: apollo-portal
            servicePort: 8070
          path: /
  status:
    loadBalancer: {}
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

service配置如下:

apiVersion: v1
kind: Service
metadata:
  annotations:
    field.cattle.io/ipAddresses: "null"
    field.cattle.io/targetDnsRecordIds: "null"
    field.cattle.io/targetWorkloadIds: "null"
  creationTimestamp: "2020-12-22T07:02:23Z"
  labels:
    app.kubernetes.io/version: 1.7.1
  name: apollo-portal
  namespace: mall
  resourceVersion: "336490784"
  selfLink: /api/v1/namespaces/mall/services/apollo-portal
  uid: a47e04ef-4423-11eb-a028-ac1f6b6cd636
spec:
  clusterIP: 10.68.65.74
  ports:
  - name: http
    port: 8070
    protocol: TCP
    targetPort: 8070
  selector:
    app: apollo-portal
  sessionAffinity: ClientIP
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 10800
  type: ClusterIP
status:
  loadBalancer: {}

pod信息:

➜ kubectl get pod
NAME                                    READY   STATUS    RESTARTS   AGE
apollo-adminservice-86f68f989b-hpv4j    1/1     Running   0          3h54m
apollo-adminservice-86f68f989b-rv9bd    1/1     Running   0          3h54m
apollo-configservice-7467fb54f8-bs7dx   1/1     Running   0          3h54m
apollo-configservice-7467fb54f8-zbpxm   1/1     Running   0          3h54m
apollo-portal-57ddbf686f-p4vcm          1/1     Running   0          32m
apollo-portal-57ddbf686f-trb8z          1/1     Running   0          4m14s
➜ kubectl get ing
NAME            HOSTS              ADDRESS   PORTS   AGE
apollo-portal   conf.test.cn             80      178m
➜ kubectl get svc
NAME                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
apollo-adminservice    ClusterIP   10.68.6.254     <none>        8090/TCP   4h1m
apollo-configdb        ClusterIP   10.68.95.70     <none>        3306/TCP   4h1m
apollo-configservice   ClusterIP   10.68.53.26     <none>        8080/TCP   4h1m
apollo-portal          ClusterIP   10.68.65.74     <none>        8070/TCP   178m
apollo-portaldb        ClusterIP   10.68.108.159   <none>        3306/TCP   178m

wzdushu avatar Dec 22 '20 10:12 wzdushu

@nobodyiam 大佬有遇到这种问题吗?

wzdushu avatar Dec 22 '20 10:12 wzdushu

@iwz2099 我们之前测试的是基于 nginx controller 的,可以根据文档换用 nginx 试试?原理都是一样的,就是 ingress 转发的时候做一下 session sticky,不过 traefik 的配置不太熟。。

nobodyiam avatar Dec 23 '20 00:12 nobodyiam

我们也遇到了这个问题,推测是portal多实例时,各个实例之间session没有共享,我们没有做session sticky,只启动了一个实例,portal压力也不会很大,而且挂了也不会影响业务,所以可以忍受

zhousbo avatar Dec 23 '20 03:12 zhousbo

这个用ingress的会话保持就可以了,具体添加代码如下: metadata: annotations: nginx.ingress.kubernetes.io/affinity: "cookie" # 解决会话保持 nginx.ingress.kubernetes.io/session-cookie-name: "route" nginx.ingress.kubernetes.io/session-cookie-expires: "172800" nginx.ingress.kubernetes.io/session-cookie-max-age: "172800"

JamalyYao avatar Jan 08 '21 10:01 JamalyYao

遇到相同的问题,举爪...

scue avatar Jan 25 '21 10:01 scue

@nobodyiam @JamalyYao

各位大大已经解决了,忘了补充了,参考traefik官方文档 https://doc.traefik.io/traefik/v1.7/configuration/backends/kubernetes/ 让配置成 traefik.ingress.kubernetes.io/affinity: "true"

其实有一定误导性,换回sticky就可以了,下面是我的配置。

Ingress配置:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    traefik.backend.loadbalancer.sticky: "true"
    traefik.ingress.kubernetes.io/ingress.class: traefik
    traefik.ingress.kubernetes.io/load-balancer-method: drr
    traefik.ingress.kubernetes.io/max-conn-amount: "1000"
    traefik.ingress.kubernetes.io/session-cookie-name: JSESSIONID

Service配置:

apiVersion: v1
kind: Service
metadata:
  annotations:
    traefik.backend.loadbalancer.sticky: "true"
    traefik.ingress.kubernetes.io/load-balancer-method: drr
    traefik.ingress.kubernetes.io/session-cookie-name: JSESSIONID`

wzdushu avatar Mar 11 '21 11:03 wzdushu

之前也有好多提过类似的问题,基本上在LB层按源ip hash到portal实例就能解决

nisiyong avatar Mar 12 '21 05:03 nisiyong