tiflow icon indicating copy to clipboard operation
tiflow copied to clipboard

ticdc: Support Multiple Downstream Addresses for MySQL

Open wlwilliamx opened this issue 1 year ago • 3 comments

Is your feature request related to a problem?

In the current version, TiCDC can only connect one server of a MySQL-compatible database cluster in one Changefeed. If the connection is lost, it will be necessary to recreate the changefeed, as it will not automatically connect to other available servers in the cluster.

Describe the feature you'd like

As stated in the title, we can allow multiple MySQL-compatible downstream addresses in --sink-uri option when user create/update changefeed. When one of the downstream servers fails, TiCDC can automatically switch to another available server from the provided multiple optional servers to continue working, thereby providing high availability.

NOTE: This feature is unrelated to load balance; it's solely for fault tolerance.

Therefore, the value of the --sink-uri option in the command can be changed to the following format: [scheme]://[user[:password]@][host[:port]][,host[:port]][,host[:port]][/path][?param1=value1&paramN=valueN]

Related code

In the TiCDC code, the following places are involved in establishing a connection with MySQL-compatible downstream:

  • newMySQLSyncPointStore(): cdc/syncpointstore/mysql_syncpoint_store.go, needs modification.
  • CreateMySQLDBConn(): pkg/sink/mysql/db_helper.go Places where CreateMySQLDBConn() is used:
    • NewDDLSink(): cdc/sink/ddlsink/mysql/mysql_ddl_sink.go, established a connection with the downstream database for DDL Sink, needs modification.
    • NewMySQLSink(): cdc/sink/dmlsink/txn/txn_dml_sink.go, established a connection with the downstream database for DML Sink,needs modification.
    • NewObserver(): pkg/sink/observer/observer.go, established a connection with the downstream database for the Owner to create Observers and periodically query certain performance metrics of the downstream TiDB via SQL, needs modification.
    • ~~TestNewMySQLTimeout(): cdc/sink/dmlsink/txn/mysql/mysql_test.go,a UT for test timeout, no change needed.~~
    • ~~checkBDRMode(): cdc/sink/validator/validator.go,temporarily establish a connection to the downstream database to check if BDR Mode is supported, no change needed.~~
    • ~~doVerify(): pkg/upstream/upstream.go,temporarily establish a connection to the upstream database to authenticate upstream user, no change needed.~~
  • ~~openDB(): cmd/kafka-consumer/writer.go,used by kafka-consumer to open the upstream database (for checking diff), no change needed.~~
  • ~~A bunch of tests, no change needed.~~

Describe alternatives you've considered

No response

Teachability, Documentation, Adoption, Migration Strategy

No response

wlwilliamx avatar Aug 07 '24 02:08 wlwilliamx

IMO it's better to use another layer like load balancer to solve this problem. CDC sink can only care about output protocol, high availability of downstream is not a responsibility of CDC.

lance6716 avatar Aug 07 '24 15:08 lance6716

IMO it's better to use another layer like load balancer to solve this problem. CDC sink can only care about output protocol, high availability of downstream is not a responsibility of CDC.

Thank you for your feedback. While using a load balancer is indeed a common solution for high availability, it can also become a single point of failure, especially if not properly configured or if the load balancer itself encounters issues. By allowing TiCDC to support multiple downstream addresses natively, we can add an extra layer of redundancy. This would enable TiCDC to automatically switch to another available server in the cluster if the primary server fails, providing more robust fault tolerance. This approach could be more reliable in scenarios where a load balancer might not be feasible or adds additional complexity.

Moreover, by integrating this functionality directly into TiCDC, we simplify the deployment and management of the system, as users wouldn’t need to rely on external solutions for high availability. This makes TiCDC more resilient and easier to use in a variety of environments.

wlwilliamx avatar Aug 12 '24 03:08 wlwilliamx

@BenMeadowcroft Please take a look

flowbehappy avatar Aug 14 '24 03:08 flowbehappy