cdc-apache-cassandra
cdc-apache-cassandra copied to clipboard
[documentation] Configuring CDC source with a Cassandra cluster behind a router
While the CDC agent publish mutations to the dirty topic in Pulsar, the deployed C* source needs to query back Cassandra nodes for the converged record before publishing to the clean topic. This setup can maybe challenging because it is not obvious how would the source connector talk to the Cassandra cluster and do node discovery.
It would be nice to document a reference network topology with Cassandra setting behind a NAT address and instruct the user how to consider their source contact points.
Few notes:
- Cassandra driver does automatically discover nodes after connecting to an initial set of nodes defined by contact points.
- Contact points are configured on the source via
--source-config "{
\"keyspace\": \"ks1\",
\"table\": \"table1\",
...
\"contactPoints\": \"localhost OR NAT address/etc.\",
...
}"
- Cassandra nodes has few relevant configs in the cassandra.yml conf file (namely listen_address, rpc_address, broadcast_address and broadcast_rpc_address). Check the advanced settings section.
- Seed nodes are good candidate for bootstrapping the driver )and can go in contactPoints) - it might be reasonable to one expose those via NAT and keep the non-seed nodes private. (More regarding seed points)