pulsar-helm-chart icon indicating copy to clipboard operation
pulsar-helm-chart copied to clipboard

Unable to connect to Pulsar in Multi AKS cluster setup

Open SrikanthNampally opened this issue 7 months ago • 2 comments

@lhotari Hello, We have Pulsar cluster deployed across different AKS regions along with Pulsar Proxy. And when clients connect to proxy using the proxy endpoint it is not resolving to the external advertised Listener which is configured in the broker config. The error we are getting is below. Is there a specific config or parameter for this to work in a multi AKS cluster.(Single Pulsar cluster deployed acress multiple AKS clusters)

WARN org.apache.pulsar.client.impl.ClientCnx - [id: 0xb9d71e43, L:/ xxx.xx.xxx.xx:43504 - R:pulsar-proxy.d9.abc.musea2.azdt.abc.com/xxx.xx.xx.aaa:6651] Received error from server: Namespace bundle for topic (persistent://apache/pulsar/test-topic-1-partition-0) not served by this instance:pulsar-test-broker-1.pulsar-test-broker.pulsar-d9-fleet.svc.cluster.local:8080. Please redo the lookup. Request is denied: namespace=apache/pulsar 2025-05-28T18:15:10,863+0000 [pulsar-client-io-3-1] INFO org.apache.pulsar.client.impl.ProducerImpl - [persistent://apache/pulsar/test-topic-1-partition-0] [null] Temporary error in creating producer: {"errorMsg":"Namespace bundle for topic (persistent://apache/pulsar/test-topic-1-partition-0) not served by this instance:pulsar-test-broker-1.pulsar-test-broker. pulsar-d9-fleet.svc.cluster.local:8080. Please redo the lookup. Request is denied: namespace=apache/pulsar","reqId":2726814634015880194, "remote":"pulsar-proxy.d9.a182198.musea2.azdt.abc.com/ xxx.xx.xxx.xx:6651", "local":"/ xxx.xx.xxx.xx:43504"} 2025-05-28T18:15:10,863+0000 [pulsar-client-io-3-1] WARN org.apache.pulsar.client.impl.ConnectionHandler - [persistent://apache/pulsar/test-topic-1-partition-0] [null] Error connecting to broker: org.apache.pulsar.client.api.PulsarClientException$LookupException: {"errorMsg":"Namespace bundle for topic (persistent://apache/pulsar/test-topic-1-partition-0) not served by this instance:pulsar-test-broker-1.pulsar-test-broker. pulsar-d9-fleet.svc.cluster.local:8080. Please redo the lookup. Request is denied: namespace=apache/pulsar","reqId":2726814634015880194, "remote":"pulsar-proxy.d9.a182198.musea2.azdt.abc.com/ xxx.xx.xxx.xx:6651", "local":"/ xxx.xx.xxx.xx:43504"} 2025-05-28T18:15:10,863+0000 [pulsar-client-io-3-1] WARN org.apache.pulsar.client.impl.ConnectionHandler - [persistent://apache/pulsar/test-topic-1-partition-0] [null] Could not get connection to broker: org.apache.pulsar.client.api.PulsarClientException$LookupException: {"errorMsg":"Namespace bundle for topic (persistent://apache/pulsar/test-topic-1-partition-0) not served by this instance:pulsar-test-broker-1.pulsar-test-broker. pulsar-d9-fleet.svc.cluster.local:8080. Please redo the lookup. Request is denied: namespace=apache/pulsar","reqId":2726814634015880194, "remote":"pulsar-proxy.d9.a182198.musea2.azdt.abc.com/ xxx.xx.xxx.xx:6651", "local":"/ xxx.xx.xxx.xx:43504"} -- Will try again in 0.1 s 2025-05-28T18:15:10,965+0000 [pulsar-timer-9-1] INFO org.apache.pulsar.client.impl.ConnectionHandler - [persistent://apache/pulsar/test-topic-1-partition-0] [null] Reconnecting after connection was closed

SrikanthNampally avatar May 28 '25 18:05 SrikanthNampally

Is there a specific config or parameter for this to work in a multi AKS cluster.(Single Pulsar cluster deployed acress multiple AKS clusters)

Is this somehow related to Apache Pulsar Helm chart? What type of configuration are you using for this deployment?

lhotari avatar May 30 '25 15:05 lhotari

Yes, I was able to resolve this by creating an advertisedListener for each pod(with a LB endpoint which can be resolved across clusters) but now I'm running into another issue. Although I'm able to create partitioned topics and publish to them successfully I'm not able to create subscriptions or topics without partitions. Here is my Broker start up script in my statefulset. When I check the startup script its still using svc.local endpoints..I tried to delete znode and restart pod but no use. My current setup is something like this AKS clusters in 3 US regions which I vnet peered and private DNS linked to all AKS clusters and deployed one pulsar cluster across this 3 AKS clusters(3 zk,3 broker and 6 Bookie pods in each AKS cluster forming one Pulsar cluster)

`bin/apply-config-from-env.py conf/client.conf; bin/apply-config-from-env.py conf/broker.conf; bin/gen-yml-from-env.py conf/functions_worker.yml; echo "OK" > "${statusFilePath:-status}";

      #/pulsar/keytool/keytool.sh broker ${HOSTNAME}.pulsar-test-broker.abc-pulsar-d9-fleet.svc.cluster.local true;
      /pulsar/keytool/keytool.sh broker pulsardev-ha.d9.abc.ad.azdt.abc.com true;
      timeout 15 bin/pulsar zookeeper-shell -server pulsardev-zk.d9.abc.ad.azdt.abc.com:2281 get /loadbalance/brokers/pulsardev-ha.d9.abc.ad.azdt.abc.com:8080;
      while [ $? -eq 0 ]; do
        echo "broker pulsardev-ha.d9.abc.ad.azdt.abc.com znode still exists ... check in 10 seconds ...";
        sleep 10;
        timeout 15 bin/pulsar zookeeper-shell -server pulsardev-zk.d9.abc.ad.azdt.abc.com:2281 get /loadbalance/brokers/pulsardev-ha.d9.abc.ad.azdt.abc.com:8080;
      done;
      cat conf/pulsar_env.sh;
      ORD=$(echo $HOSTNAME | rev | cut -d'-' -f1 | rev)
      if [ "$ORD" == "0" ]; then
        sed -i "s|^advertisedListeners=|advertisedListeners=internal:pulsar+ssl://100.xx.xxx.36:6651|g" conf/broker.conf;
      elif [ "$ORD" == "1" ]; then
        sed -i "s|^advertisedListeners=|advertisedListeners=internal:pulsar+ssl://100.xx.xxx.42:6651|g" conf/broker.conf;
      elif [ "$ORD" == "2" ]; then
        sed -i "s|^advertisedListeners=|advertisedListeners=internal:pulsar+ssl://100.xx.xxx.43:6651|g" conf/broker.conf;
      fi
      OPTS="${OPTS} -Dlog4j2.formatMsgNoLookups=true" exec bin/pulsar broker;`

SrikanthNampally avatar Jun 03 '25 00:06 SrikanthNampally