milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: Paramtable cache cause some dynamic config non-dynamic.

Open aoiasd opened this issue 1 year ago • 16 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version:
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Some config support dynamic change been non-dynamic.

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

aoiasd avatar May 29 '24 11:05 aoiasd

/assign @aoiasd /unassign @yanliang567

aoiasd avatar May 29 '24 11:05 aoiasd

@aoiasd Does ttMsgEnabled support dynamic changes? Please help @anhnch30820 with this issue: https://github.com/zilliztech/milvus-operator/issues/129

@aoiasd log for milvus v2.4.4 can be found in https://github.com/user-attachments/files/15706959/milvus-milvus-proxy-6c98fb49c4-622fp_proxy.log.

haorenfsa avatar Jun 24 '24 03:06 haorenfsa

@anhnch30820 Would you please provide your logs of milvus v2.4.5 here?

haorenfsa avatar Jun 24 '24 03:06 haorenfsa

@haorenfsa @aoiasd here is the logs of milvus v2.4.5 milvus-milvus-proxy-746fbd5d-724dk_proxy.log

anhnch30820 avatar Jun 24 '24 04:06 anhnch30820

/assign @aoiasd

haorenfsa avatar Jun 24 '24 04:06 haorenfsa

@haorenfsa @aoiasd here is the logs of milvus v2.4.5 milvus-milvus-proxy-746fbd5d-724dk_proxy.log

image image From proxy logs we could see this, means ttmsgenable has dynamic changed in proxy. There was some other problem cause collection can't be loaded, but there only logs of proxy, i can't find the question. So could you provide logs of querycoord and querynode?

aoiasd avatar Jun 24 '24 06:06 aoiasd

@aoiasd here is the logs of querycoord and querynode. milvus-milvus-querynode-0-865dff889-xftxs_querynode.log milvus-milvus-querycoord-5f558b84ff-4dw8p_querycoord.log

image Seems kafka connect has some problem. Querynode can't comsume message from kafka after 2024/06/24 04:44:47.960. Is this kafka config right? image

aoiasd avatar Jun 25 '24 07:06 aoiasd

@aoiasd The kafka config I get from milvus sizing tool

apiVersion: milvus.io/v1beta1
kind: Milvus
metadata:
  name: milvus
  labels:
    app: milvus
spec:
  dependencies:
    msgStreamType: kafka
    kafka:
      inCluster:
        values:
          heapOpts: "-Xmx4096M -Xms4096M"
          persistence:
            enabled: true
            storageClass:
            accessMode: ReadWriteOnce
            size: 40Gi
          resources:
            limits:
              cpu: 2
              memory: 13Gi
          zookeeper:
            enabled: true
            replicaCount: 3
            heapSize: 1024  # zk heap size in MB
            persistence:
              enabled: true
              storageClass: ""
              accessModes:
                - ReadWriteOnce
              size: 20Gi #SSD Required
            resources:
              limits:
                cpu: 1
                memory: 2Gi

anhnch30820 avatar Jun 25 '24 08:06 anhnch30820

@anhnch30820 It's very likely that your kafka pods are down. Please check those pods' status. And maybe you need to increase the size of their PVCs if their disks are full.

haorenfsa avatar Jun 26 '24 03:06 haorenfsa

@haorenfsa @aoiasd those pods's status running with no problem. I try to use pulsar but it also doesn't work like kafka. Here is log of querynode when use pulsar milvus-milvus-querynode-0-596f9b47fb-szs6p_querynode.log

anhnch30820 avatar Jun 26 '24 07:06 anhnch30820

@anhnch30820 The log you provided suggests it works fine. There seems to be no client actions after your milvus get started.

haorenfsa avatar Jun 26 '24 09:06 haorenfsa

@haorenfsa No, it doesn't work. Collection unloaded and it stuck at 0%

anhnch30820 avatar Jun 26 '24 09:06 anhnch30820

@anhnch30820 could you provide more infomation, like the outputs of kubectl get pods & kubectl describe milvus

haorenfsa avatar Jun 27 '24 02:06 haorenfsa

@haorenfsa get_pods.log describe_milvus.log You see

anhnch30820 avatar Jun 27 '24 02:06 anhnch30820

@anhnch30820 It does look good status.

@aoiasd what else do you need to figure out what's happening?

haorenfsa avatar Jun 28 '24 04:06 haorenfsa

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Jul 30 '24 05:07 stale[bot]