milvus-cdc icon indicating copy to clipboard operation
milvus-cdc copied to clipboard

[Bug]: faile to start server while connecting source milvus with kafka

Open zyyworktable opened this issue 10 months ago • 11 comments

Current Behavior

I have downloaded the compiled cdc-bin-v2.0.0-rc4 and successfully configured cdc.yaml, achieving the task of synchronizing from the source Milvus (deployed on Kubernetes) to the target Milvus. The source Milvus uses Pulsar; however, when I tried to configure the server to use a source Milvus cluster (deployed on Kubernetes) that uses Kafka, the server started with the following error:

[2025/04/15 16:28:59.892 +08:00] [INFO] [paramtable/component_param.go:4318] ["DeployModeEnv is not set, use default"] [default=0.5] [2025/04/15 16:28:59.893 +08:00] [INFO] [paramtable/hook_config.go:21] ["hook config"] [hook={}] [2025/04/15 16:28:59.893 +08:00] [INFO] [tag/tag.go:34] ["base info"] [BuildTime=unknown] [GitCommit=unknown] [GoVersion=unknown] [2025/04/15 16:28:59.895 +08:00] [DEBUG] [[email protected]/call.go:35] ["retrying of unary invoker"] [target=etcd-endpoints://0xc0007e88c0/132.xx.xx.xx:30029] [attempt=0] [2025/04/15 16:28:59.898 +08:00] [DEBUG] [[email protected]/call.go:35] ["retrying of unary invoker"] [target=etcd-endpoints://0xc0007e88c0/132.xx.xx.xx:30029] [attempt=0] [2025/04/15 16:28:59.901 +08:00] [DEBUG] [[email protected]/call.go:35] ["retrying of unary invoker"] [target=etcd-endpoints://0xc000982700/132.xx.xx.xx:38215] [attempt=0] [2025/04/15 16:28:59.904 +08:00] [INFO] [kafka/kafka_client.go:70] ["init kafka Config "] [commonConfig="[reconnect.backoff.ms:20 reconnect.backoff.max.ms:5000 bootstrap.servers:132.xx.xx.xx:31591 api.version.request:true]"] [extraConsumerConfig="[]"] [extraProducerConfig="[]"] [2025/04/15 16:28:59.904 +08:00] [INFO] [msgstream/mq_msgstream.go:118] ["Msg Stream state"] [can_produce=true] milvus-cdc: dl-call-libc-early-init.c:37: _dl_call_libc_early_init: Assertion `sym != NULL' failed. Aborted (core dumped).

The used cdc.yaml is as follows:

address: 0.0.0.0:8444 maxTaskNum: 100 metaStoreConfig: storeType: etcd etcdEndpoints: - 132.xxx.xx.xx:30029 rootPath: cdc-by-dev sourceConfig: etcd: address: - http://132.xxx.xx.xx:38215 rootPath: by-dev metaSubPath: meta enableAuth: false readChanLen: 10 defaultPartitionName: _default replicateChan: by-dev-replicate-msg kafka: address: 132.xxx.xx.xx:31591 maxNameLength: 256 logLevel: debug detectDeadLock: false

What could be the cause of this issue? How can I fix it?

Expected Behavior

No response

Steps To Reproduce

No response

Environment

No response

Anything else?

No response

zyyworktable avatar Apr 15 '25 08:04 zyyworktable

is there any extra configs I need to add in the config file ?

zyyworktable avatar Apr 15 '25 08:04 zyyworktable

@zyyworktable This seems to be because the local kafka is missing some files. Milvus's mq is also used in kafka. Can milvus start normally? If yes, you can try to deploy cdc to the machine where milvus is located.

SimFG avatar Apr 15 '25 09:04 SimFG

@zyyworktable This seems to be because the local kafka is missing some files. Milvus's mq is also used in kafka. Can milvus start normally? If yes, you can try to deploy cdc to the machine where milvus is located.

The milvus with kafka is working well whose component all deployed in k8s. I m sure that cdc machine can connect the milvus machine since the pulsar milvus and the kafka milvus are in the same k8s cluster. I have made success in cdc task with the pulsar milvus. Is that something wrong with the cdc config?

zyyworktable avatar Apr 16 '25 02:04 zyyworktable

@zyyworktable This seems to be because the local kafka is missing some files. Milvus's mq is also used in kafka. Can milvus start normally? If yes, you can try to deploy cdc to the machine where milvus is located.

The K8S which milvus deployed are not suitable for deploying cdc, is ther any other suggestions that may help to find out the reason of failure? Thanks a lot for your suggestions and help. Best wishes.

zyyworktable avatar Apr 16 '25 03:04 zyyworktable

@zyyworktable One solution is to run cdc in the source milvus pod. This is mainly because the kafka connection library currently used depends on c++, which leads to this problem.

SimFG avatar Apr 16 '25 07:04 SimFG

@zyyworktable One solution is to run cdc in the source milvus pod. This is mainly because the kafka connection library currently used depends on c++, which leads to this problem.

I have a similar question, is there any method to solve this problem? Tks. Here are some sys infos and key startup logs:
SYS VERSION: kafka: 2.7.1 source: milvus 2.3.+ distributed target: milvus 2.3.+ standalone

STARTUP LOGS: [2025/06/10 02:06:56.071 +00:00] [INFO] [kafka/kafka_client.go:70] ["init kafka Config "] [commonConfig="[bootstrap.servers:10.xx.154.xxx:9092,10.xx.154.xxx:9092,10.xx.154.xxx:9092 api.version.request:true reconnect.backoff.ms:20 reconnect.backoff.max.ms:5000]"] [extraConsumerConfig="[]"] [extraProducerConfig="[]"] ... ... [2025/06/10 02:07:36.075 +00:00] [WARN] [kafka/kafka_consumer.go:139] ["consume msg failed"] [topic=by-dev-replicate-msg] [groupID=cdc-5f451eeb328c4dd8ac24ad622bc469cf-8444-by-dev-replicate-msg_fd00f58fd2984e069e85e2a89dd8e6d6v0-true] [error="Local: Timed out"]

Skkypy avatar Jun 10 '25 02:06 Skkypy

@Skkypy This warning will not have any impact. This warning indicates that Kafka has been connected normally.

SimFG avatar Jun 10 '25 07:06 SimFG

@Skkypy This warning will not have any impact. This warning indicates that Kafka has been connected normally.

After creating a new collection in my Milvus cluster with Kafka messaging enabled, data still fails to sync. I then searched through the GitHub code for milvus-cdc and confluent-kafka-go, but couldn’t locate the file mentioned in the logs: [kafka/kafka_consumer.go:139]. Could you tell me which version of the Kafka SDK is being used? Any help would be appreciated. Thanks!

Skkypy avatar Jun 20 '25 09:06 Skkypy

github.com/confluentinc/confluent-kafka-go/v2 v2.5.3

SimFG avatar Jun 20 '25 09:06 SimFG

github.com/confluentinc/confluent-kafka-go/v2 v2.5.3

kafka-go v2.5.3 Is there anything I missed ? kafka folder isn't include this go file kafka_consumer.go

Skkypy avatar Jun 20 '25 13:06 Skkypy

If only warn logs are output, it means that it is not a problem with Kafka lib. It is recommended to start with configuration.

SimFG avatar Jun 20 '25 14:06 SimFG