apisix-ingress-controller icon indicating copy to clipboard operation
apisix-ingress-controller copied to clipboard

bug: sometimes the k8s apisixroute will not sync to apisix

Open wolgod opened this issue 2 years ago • 8 comments

Issue description

apisixroute 内容变动的时候我通过client去更新apisixroute,k8s中apisixroute资源文件 内容也更新了status也显示成功,但是并没有同步到apisix中, 这个问题已经出现过多次了,没排查到原因

Environment

  • your apisix-ingress-controller version (output of apisix-ingress-controller version --long):1.4.1
  • your Kubernetes cluster version (output of kubectl version):1.18
  • if you run apisix-ingress-controller in Bare-metal environment, also show your OS version (uname -a):centos7

Minimal test code / Steps to reproduce

begin the apisixroute

kind: ApisixRoute
metadata:
  name: basik-k8s-prod
  namespace: ns-1
spec:
  http:
    - authentication:
        enable: false
        type: basicAuth
      backends:
        - serviceName: basik-k8s-prod
          servicePort: 30096
          weight: 100
      match:
        hosts:
          - basik-k8s-prod.aii.com
        paths:
          - /*
      name: rule-0
      websocket: true
    - authentication:
        enable: false
        type: basicAuth
      backends:
        - serviceName: basik-k8s-prod
          servicePort: 30096
          weight: 100
      match:
        hosts:
          - zhiguanapi-basic.bbb.cn
        paths:
          - /openapi*
      name: rule-1
      websocket: true

过了一段时间后,2022-09-01T10:31更改了这个apisixtoute里面的路由,通过以下方式更新

	_, err = ingressClient.Create(cont, ing, metav1.CreateOptions{})
        ....
	_, err = ingressClient.Update(cont, ing, metav1.UpdateOptions{})

最后的资源文件如下:

apiVersion: apisix.apache.org/v2beta3
kind: ApisixRoute
metadata:
  creationTimestamp: "2022-07-14T12:20:54Z"
  generation: 4
  managedFields:
    - apiVersion: apisix.apache.org/v2beta3
      fieldsType: FieldsV1
      fieldsV1:
        f:spec: {}
      manager: myapp
      operation: Update
      time: "2022-07-25T08:49:26Z"
    - apiVersion: apisix.apache.org/v2beta3
      fieldsType: FieldsV1
      fieldsV1:
        f:status: {}
      manager: apisix-ingress-controller
      operation: Update
      time: "2022-08-08T11:44:30Z"
    - apiVersion: apisix.apache.org/v2beta3
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          f:http: {}
      manager: k8s-api
      operation: Update
      time: "2022-09-01T10:31:24Z"
  name: basik-k8s-prod
  namespace: ns-1
  resourceVersion: "1123054337"
  selfLink: /apis/apisix.apache.org/v2beta3/namespaces/ns-2277/apisixroutes/basik-k8s-prod
  uid: 3835e88a-9b88-4e84-81c5-5f43063e63dd
spec:
  http:
    - authentication:
        enable: false
        type: basicAuth
      backends:
        - serviceName: basik-k8s-prod
          servicePort: 30096
          weight: 100
      match:
        hosts:
          - basik-k8s-prod.aii.com
        paths:
          - /*
      name: rule-0
      websocket: true
    - authentication:
        enable: false
        type: basicAuth
      backends:
        - serviceName: basik-k8s-prod
          servicePort: 30096
          weight: 100
      match:
        hosts:
          - zhiguanapi-basic.bbb.cn
        paths:
          - /openapi/*
      name: rule-1
      websocket: true
    - authentication:
        enable: false
        type: basicAuth
      backends:
        - serviceName: basik-k8s-prod
          servicePort: 30096
          weight: 100
      match:
        hosts:
          - zhiguanapi.ccc.cn
        paths:
          - /openapi_v2/*
      name: rule-2
      websocket: true
    - authentication:
        enable: false
        type: basicAuth
      backends:
        - serviceName: basik-k8s-prod
          servicePort: 30096
          weight: 100
      match:
        hosts:
          - zhiguanapi-basic.bbb.cn
        paths:
          - /openapi_v2/*
      name: rule-3
      websocket: true
status:
  conditions:
    - message: Sync Successfully
      observedGeneration: 3
      reason: ResourcesSynced
      status: "True"
      type: ResourcesAvailable

变化就是路由有原来的 basik-k8s-prod.aii.com/* zhiguanapi-basic.bbb.cn/openapi* 变成了 basik-k8s-prod.aii.com/* zhiguanapi-basic.bbb.cn/openapi/* zhiguanapi.ccc.cn/openapi_v2/* zhiguanapi-basic.bbb.cn/openapi_v2/*

当我重启了controller之后,路由才同步过来,现在看来apisixroute的状态显示同步成功,但是并没有同步到apisix中去,看源码没有找到原因,希望可以得到回复

Actual result

路由能够成功同步

Error log

没找到error log

Expected result

No response

wolgod avatar Sep 02 '22 06:09 wolgod

Can you give me a reproduction step unrelated to business.

AlinsRan avatar Sep 02 '22 08:09 AlinsRan

Can you test with v1.5.0-rc1 version? The Helm chart has been released (since its a pre-release, so you need to add --devel flag to helm command.

tao12345666333 avatar Sep 02 '22 08:09 tao12345666333

Can you give me a reproduction step unrelated to business.

我不知道问题啥时候发生的,是在业务使用过程中每隔几天就会出现,因为我的路由会同时生成到traefik中,我是通过定时任务对比发现traefik和apisix中路由的差异才发现路由丢失,进而去回溯事件发生的原因,在我的测试中也没有出现这个问题,不知道是否和生产环境中路由个数比较多有关,大约有3000多个apisixroute资源

wolgod avatar Sep 03 '22 13:09 wolgod

Maybe you can try to enable the debug mode log and see if there is any relevant information.

tao12345666333 avatar Sep 04 '22 07:09 tao12345666333

Similar problem. And I observed that normal sync generates two ResourcesSynced events, while abnormal sync only has one event

machinly avatar Sep 05 '22 02:09 machinly

@machinly Can you provide steps to reproduce? Or provide more relevant information, thank you

tao12345666333 avatar Sep 05 '22 06:09 tao12345666333

今天又发现几个路由丢失的情况,出现的情况都是一个路由增加几个其他路由的场景,当我尝试去修改其中的一个路由里面的websocket字段为false后该路由会同步,但是另外几个新增的不会同步,应该是内存里已经有了路由所以对比没有变化就没有更新,手动去更新另外几个路由的其中一个字段也能同步更新

我尝试去定位一下是否是调用apisix失败导致丢失,但是没有找到相关日志

  - authentication:
      enable: false
      type: basicAuth
    backends:
    - serviceName: caaaaa-aut
      servicePort: 41160
      weight: 100
    match:
      hosts:
      - testaa.com
      paths:
      - /orderlist/*
    name: rule-4
    websocket: true

wolgod avatar Sep 09 '22 06:09 wolgod

You can check both Ingress controller and APISIX's log.

If controller sync resource to APISIX, you can find the log

tao12345666333 avatar Sep 09 '22 21:09 tao12345666333

This issue has been marked as stale due to 90 days of inactivity. It will be closed in 30 days if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Dec 18 '22 01:12 github-actions[bot]

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.

github-actions[bot] avatar Jan 17 '23 01:01 github-actions[bot]