pulsar-helm-chart icon indicating copy to clipboard operation
pulsar-helm-chart copied to clipboard

Pulsar Manager Persistence in HerdDB

Open Mortom123 opened this issue 1 year ago • 4 comments

Is your feature request related to a problem? Please describe. Currently the Pulsar Manager is not persisted outside of the container it is running in. This means reconditioning of it is necessary should the helm chart be reinstalled. This is not a favorable solution.

Describe the solution you'd like We can either provide a custom jdbc connectionstring to an external storage medium or use the zookeeper to store this data. I think storing in the zookeeper should be the preferred solution. The default value in values.yaml for the key: pulsar_manager.configData.URL (as well as others that need tuning) should be set according to: https://github.com/apache/pulsar-manager#default-test-database-herddb

Describe alternatives you've considered External DB: keeping data in Zookeeper puts Zookeeper as a central point of data and should therefore be favored.

Mortom123 avatar Jan 29 '24 15:01 Mortom123

I renamed Zookeeper -> HerdDB in the title. @Mortom123 I guess that's what you meant with persistence in Zookeeper? HerdDb uses both Zookeeper and Bookkeeper.

lhotari avatar Feb 15 '24 07:02 lhotari

#343 has been merged. That provides persistence using a PVC and Postgres Db. I guess that might be sufficient for many use cases.

lhotari avatar Feb 15 '24 09:02 lhotari

If it is possible to condogurr the jdbc url then storing data on BookKeeper using HerdDB is easy. HerdDB uses Zookeeper the very same way as Pulsar BookKeeper.

eolivelli avatar Mar 04 '24 06:03 eolivelli

@eolivelli - I also thought that. Optimally, I would like to persist all of the data needed to run the pulsar manager in a single point in the cluster. I stumbled upon this setting here:

# HerdDB - start embedded server 'diskless-cluster' mode, WAL and Data persisted on Bookies, Metadata on ZooKeeper in '/herd', listening on localhost:7000
#spring.datasource.url=jdbc:herddb:zookeeper:localhost:2181?server.start=true&server.base.dir=dbdata&server.mode=diskless-cluster&server.node.id=localhost

And thought that the environment variable of the Pulsar Manager SPRING_DATASOURCE_URL could be set to something like: jdbc:herddb:zookeeper:zookeeper.service:2181?server.start=true&server.base.dir=dbdata&server.mode=diskless-cluster&server.node.id=(randAlpha10) to enable this. This would mean the Pulsar Manager container itself does not need extra volume(-mounts) and data is at a central point in the cluster.

Does this work?

Mortom123 avatar Mar 11 '24 15:03 Mortom123